|
Overview
Analysis Name | Fragaria Unigene v5.0 |
Method | CAP3 |
Source | Genbank Fragaria ESTs (July 2012) |
Date performed | 2012-12-19 |
This is the fifth version of the Fragaria unigene. This build was used many sequencing projects around the world are depositing ESTs from the genus Fragaria in the NCBI dbEST database. The Fragaria ESTs included in this assembly were downloaded on July 1, 2012.
Not all of the Fragaria ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences. To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP31 program. The final assembly has been annotated by BLAST sequence similarity searching agaist Swiss-Prot2,TrEMBL3,TAIR4Arabidopsis proteinsPrunus persica5,Populus trichocarpa6 and Vitis vinifera7.
Processing Summary |
Number of ESTs available |
58,295 |
Number of ESTs available after filtering |
55,513 |
Average Length |
627 |
Number of Contigs(CAP3 Assembly, -p 90 ) |
6,226 |
Average Length of Contigs |
952 |
Number of Singlets |
12,008 |
Number of Putative Unigenes |
18,234 |
References
- Huan, X. and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9, 868-877.
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol. 215(3):403-10.
- Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S., and Sneider M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 31:365-370.
- Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gate way to Arabidopsis biology, research materials and community. Nucleic Acids Research. 31(1):224-8.
- http://services.appliedgenomics.org/projects/drupomics/
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, Van de Peer Y, Rokhsar D. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). (2006) Science. Sep 15; 313(5793):1596-604
- French-Italian Public Consortium for Grapevine Genome Characterization.(2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. Sep 27; 449(7161):463-7.
Homology
Homology was determined using the BLASTx algorithm for the Fragaria Contigs and Singlets vs. the Swiss-Prot , TrEMBL,TAIR Arabidopsis proteins,Prunus persica, Populus trichocarpa and Vitis vinifera proteins. Only matches with an E-value of 1.0 e-6 or better were recorded. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of TrEMBL nucleotide sequence entries not yet integrated in Swiss-Prot. Homology of Fragaria in Excel spreadsheet can be downloaded from the Downloads.
Microsatellite Analysis
The type and frequency of simple sequence repeats in Fragaria unigene v5.0 contigs was determined using the MainLabssr.pl program.For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times. The SSRs of Fragaria unigene v5.0 contigs are available to be downloaded from the Downloads.
Sequence information |
Number of Sequences |
6,226 |
Number of Sequences Having One Or More SSRs |
1,853 |
Percentage of Sequences Having One Or More SSRs |
29.76% |
Total Number of SSRs Found |
2,556 |
Number of Motifs |
251 |
|
Frequency of Motif Type
Motif Length |
Frequency |
Percentage Frequency |
2bp |
801 |
31.33% |
3bp |
1379 |
53.95% |
4bp |
281 |
10.99% |
5bp |
95 |
3.72% |
|