|
Overview
Analysis Name | Malus Unigene v2 |
Method | CAP3 |
Source | Genbank Malus ESTs (Dec 19, 2005) |
Date performed | 2005-12-19 |
Many sequencing projects around the world are depositing ESTs from the genus Malus in the NCBI dbEST database. However, not all of these ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences
To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP3 1 program. For some sequences, we were able to obtain the original trace files and incorporate the phred quality values for each base into the assembly. We are in the process of annotating the final assembly through homology to Swiss-Prot, Arabidopsis, and NCBI nr proteins
For more information on this project please contact the GDR development team.
All the Malus ESTs from GenBank on December 19th, 2005 were included in this assembly. The parameters used for CAP3 were -p 90. The CAP3 output generates assembled contigs and singlets. The number of tentative unigenes for this assembly is comprised of the combined contigs and singlets
Processing Summary |
Number of ESTs available |
202,888 |
Number of ESTs available after filtering |
195,553 |
Average Length |
611.7 |
Number of Contigs(CAP3 Assembly, -p 90 ) |
22,435 |
Average Length of Contigs |
848.5 |
Number of Singlets |
45,556 |
Number of Putative Unigenes |
67,991 |
References
- Huan, X. and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9, 868-877.
- Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S., and Sneider M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 31:365-370.
- Pearson J.D. and Lipman,D.J. (1988). Improved tools for biological sequence comparison. Proc Natl Acad Sci USA, 85:2444-2448.
Homology
Homology was determined using the BLASTx algorithm for the Malus Contigs and Singlets vs. the Swiss-Prot and TrEMBL databases. Only matches with an E-value of 1.0 e-9 or better were recorded as significant. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.
|
|
Microsatellite Analysis
The type and frequency of simple sequence repeats in this unigene assembly (v2) were determined using the CUGIssr.pl program. For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times.
Sequence information |
Number of Sequences |
195553 |
Number of Sequences Having One Or More SSRs |
35504 |
Percentage of Sequences Having One Or More SSRs |
18.2% |
Total Number of SSRs Found |
44472 |
Number of Motifs |
636 |
Frequency of Motif Type
Motif Length |
Frequency |
Percentage Frequency |
2bp |
20647 |
42.4% |
3bp |
16846 |
37.9% |
4bp |
5562 |
12.5% |
5bp |
1417 |
3.2% |
|
|
Publication
No publications are currently available.
Downloads
Sequence Files: |
|
Blast Result Files: |
|
Microsatellite Files: |
|
ESNP Summary
The type and frequency of single nucleotide polymorphisms in this unigene assembly (V2) were determined using the AutoSNP software package (Savage et al., 2005).
View autoSNP output:
SNP Summary |
Number of Contigs |
22,435 |
Number of SNPs |
10,426 |
Consensus Size |
19,101,680 bp |
SNP Frequency |
0.05/100 bp |
Total Transistions |
4,759 |
Total Transversions |
2,732 |
Total Indels |
2,935 |
|
|
|