Malus Unigene v2

Overview
Analysis NameMalus Unigene v2
Unigene NameMalus Unigene v2
MethodCAP3
SourceGenbank Malus ESTs (Dec 19, 2005)
Date performed2005-12-19

Many sequencing projects around the world are depositing ESTs from the genus Malus in the NCBI dbEST database. However, not all of these ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences

To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP3 1 program. For some sequences, we were able to obtain the original trace files and incorporate the phred quality values for each base into the assembly. We are in the process of annotating the final assembly through homology to Swiss-Prot, Arabidopsis, and NCBI nr proteins

For more information on this project please contact the GDR development team.

All the Malus ESTs from GenBank on December 19th, 2005 were included in this assembly. The parameters used for CAP3 were -p 90. The CAP3 output generates assembled contigs and singlets. The number of tentative unigenes for this assembly is comprised of the combined contigs and singlets

 

 Processing Summary
 Number of ESTs available  202,888
 Number of ESTs available after filtering  195,553
 Average Length  611.7
 Number of Contigs(CAP3 Assembly, -p 90 )  22,435
 Average Length of Contigs  848.5
 Number of Singlets  45,556
 Number of Putative Unigenes  67,991

 

References

  1. Huan, X. and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9, 868-877.
  2. Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S., and Sneider M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 31:365-370.
  3. Pearson J.D. and Lipman,D.J. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci USA, 85:2444-2448.

 

Library Information
The Malus ESTs used for this assembly were downloaded on December 19th, 2005

 

 EST Libraries
 Number of ESTs available  202,888
 # of Species  3
 # of Libraries  89
 # of Tissues  41
 # of Development Stages  31

View detailed chart of libraries.

 Species
 Malus sieboldii  1,163
 Malus x domestica  197,781
 Malus x domestica x Malus sieversii  3,944
 

 

Homology

Homology was determined using the BLASTx algorithm for the Malus Contigs and Singlets vs. the Swiss-Prot and TrEMBL databases. Only matches with an E-value of 1.0 e-9 or better were recorded as significant. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.

 Homology of Malus Contigs
 Number of Contigs  22,435
 Number (%) of Contigs with a Match in Swiss-Prot Database
Download Excel Spreadsheet
 12,401 (55.3%)
 Number (%) of Contigs with a Match in TrEMBL Database
 Download Excel Spreadsheet
 19,316 (86.1%)

 

 Homology of Malus Singlets
 Number of Singlets  45,556
 Number (%) of Singlets with a Match in Swiss-Prot Database
Download Excel Spreadsheet
 14,698 (32.3%)
 Number (%) of Singlets with a Match in TrEMBL Database
 Download Excel Spreadsheet
 20,673 (45.4%)
 

 

Microsatellite Analysis

The type and frequency of simple sequence repeats in this unigene assembly (v2) were determined using the CUGIssr.pl program. For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times.

 

 Sequence information
 Number of Sequences  195553
 Number of Sequences Having One Or More SSRs  35504
 Percentage of Sequences Having One Or More SSRs  18.2%
 Total Number of SSRs Found  44472
 Number of Motifs  636

 

Frequency of Motif Type

 

 Motif Length  Frequency  Percentage Frequency
 2bp  20647  42.4%
 3bp  16846  37.9%
 4bp  5562  12.5%
 5bp  1417  3.2%
 

 

eSNP Summary

The type and frequency of single nucleotide polymorphisms in this unigene assembly (V2) were determined using the AutoSNP software package (Savage et al., 2005).

View autoSNP output:

 

 SNP Summary
 Number of Contigs  22,435
 Number of SNPs  10,426
 Consensus Size  19,101,680 bp
 SNP Frequency  0.05/100 bp
 Total Transistions  4,759
 Total Transversions  2,732
 Total Indels  2,935
 

 

Contact
 Contact Details
 Name  Main, Dorrie
 Lab  Department of Horticulture
 Organization  Washington State University
 Address  45 Johnson Hall, Pullman, WA 99164
 Telephone  509-335-2774
 Fax  509-335-8690
 Email  dorrie@wsu.edu

 

Publication

No publications are currently available.

Downloads
Sequence Files:
Blast Result Files:
Microsatellite Files: