Rosaceae Family Unigene v3
Many sequencing projects around the world are depositing ESTs from the genus Rosaceae in the NCBI dbEST database. However, not all of these ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences.
To reduce redundancy and create longer transcripts we assembled the ESTs within each of the five genera (Fragaria, Malus, Prunus, Pyrus, and Rosa) using the CAP3 1 program. We then took the contigs and singlets from these five assemblies and assembled them together, again using CAP3 with -p 90.
For some sequences, we were able to obtain the original trace files and incorporate the phred quality values for each base into the original genera assembly. The final assembly has been annotated by BLAST sequence similarity searching 2 against Swiss-Prot 3and TrEMBL 3, and TAIR 4's Arabidopsis proteins.
For more information on this project please contact the GDR development team.
All the Rosaceae ESTs from GenBank on June 14, 2006 were included in this assembly. The parameters used for CAP3 were -p 90. The CAP3 output generates assembled contigs and singlets. The number of tentative unigenes for this assembly is comprised of the combined contigs and singlets from the final assembly.
The Rosaceae ESTs used for this assembly were downloaded on June 14th, 2006
Homology was determined using the BLASTx algorithm for the Rosaceae Contigs and Singlets vs. the Swiss-Prot and TrEMBL databases. Only matches with an E-value of 1.0 e-9 or better were recorded. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of TrEMBL nucleotide sequence entries not yet integrated in Swiss-Prot.
The type and frequency of simple sequence repeats in this unigene assembly (v3) were determined using the CUGIssr.pl program. For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times.
Frequency of Motif Type - ESTs
Frequency of Motif Type - Contigs
No publications are currently available.
Contig GO Terms
The GO Terms (www.geneontology.org) were determined by comparing the contigs against Swiss-Prot using BLAST. The Sprot2GO annotation file was then used to map go terms to the sequences using relevant matches (1e-9).
4054 Contigs have Biological Process annotation:
4656 Contigs have Cellular Component annotation:
5865 Contigs have Molecular Function annotation:
eSNP Summary not availble for Rosaceae Assembly V3