Fragaria vesca Whole Genome v1.0 (build 8) Assembly & Annotation
|Analysis Name||Fragaria vesca Whole Genome v1.0 (build 8) Assembly & Annotation|
|Software||Celera Assembler (5.3)|
|Source||A combination of paired and unpaired 454, Illumina and SOLiD reads|
|Materials & Methods||
About the Assembly
The assembly and annotation files below are from the assembly as presented by the Shulaev, et. al 2010 strawberry genome paper. The assembly was constrcuted from 454, Illumina and Sanger reads using the Celera assembler and was assembled by Arthur Delcher at the University of Maryland. The pseudomolecules are the final assembly (v1.0) of the chromosomal sequence. The pseudomolecule sequences were derived from ordered and oriented scaffolds. The pseudomolecules, scaffolds used to construct the pseudomolecules, and low quality degenerate scaffolds (not used in construction of the pseudomolecules) are available for download.
|Scaffolds (FASTA file)||fvesca_v1.0_scaffolds.fna.gz|
|Scaffolds (GFF3 file)||fvesca_v1.0_scaffolds.gff3.gz|
|Degenerate scaffolds (FASTA file)||fvesca_v1.0_scaffolds-degen.fna.gz|
|Degenerate scaffolds (GFF3 file)||fvesca_v1.0_scaffolds-degen.gff3.gz|
|Pseudomolecules (FASTA file)||fvesca_v1.0_pseudo.fna.gz|
|Pseudomolecules (GFF3 file)||fvesca_v1.0_pseudo.gff3.gz|
|Mapping of scaffolds to pseudomolecules (GFF3 file)||fvesca_v1.0_scaffolds2pseudo.gff3.gz|
Ab Initio GeneMark Predictions
Ab initio GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) dated 10-12-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506
Hybrid GeneMark Predictions
Hybrid GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) with EST data constraints released 16-11-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506.
|GeneMark hybrid genes (GFF3 file)||fvesca_v1.0_genemark_hybrid.gff3.gz|
|Genemark hybrid genes mapped to pseudomolecules (GFF3 file)||fvesca_v1.0_genemark_hybrid2pseudo.gff3.gz|
|GeneMark hybrid CDS (FASTA file)||fvesca_v1.0_genemark_hybrid.fna.gz|
|GeneMark hybrid CDS w/ annotations (FASTA file)||fvesca_v1.0_genemark_hybrid.annotated.fna.gz|
|GeneMark hybrid proteins (FASTA file)||fvesca_v1.0_genemark_hybrid.faa.gz|
|GeneMark hybrid proteins w/ annotations (FASTA file)||fvesca_v1.0_genemark_hybrid.annotated.faa.gz|
|GeneMark ab initio genes (GFF3 file)||fvesca_v1.0_genemark_abinitio.gff3.gz|
|GeneMark ab initio CDS (FASTA file)||fvesca_v1.0_genemark_abinitio.fna.gz|
|GeneMark ab initio proteins (FASTA file)||fvesca_v1.0_genemark_abinitio.faa.gz|
The choloroplast genome was constructed from Illumina reads and was annotated using DGMA by Aaron Liston at Oregon State University.
|Chloroplast genes (GFF3 file)||fvesca_v1.0_chloroplast.gff3.gz|
|Chloroplast genes (Genbank file)||fvesca_v1.0_chloroplast.gb|
|Chloroplast (FASTA file)||fvesca_v1.0_chloroplast.fna|
|InterPro motifs for GeneMark ab initio derived transcripts||fvesca_v1.0_ab_initio_interproscan_data.txt.gz|
|InterPro motifs for GeneMark hybrid derived transcripts||fvesca_v1.0_hybrid_interproscan_data.txt.gz|
|InterPro GO for GeneMark ab initio derived transcripts||fvesca_v1.0_ab_initio_interpro_go_data.txt.gz|
|InterPro GO for GeneMark hybrid derived transcripts||fvesca_v1.0_hybrid_interpro_go_data.txt.gz|
Best hit reports of blastp of Fragaria vesca genome v1.0 proteins versus various protein databases. Results in Excel format. These blast results are provided by the MainLab Bioinformatics group at Washington State University using protein sequences derived from the hybrid GeneMark gene models.
Uniref90 release 15.6 consisting of 5,801,325 entries was obtained from the UniProt FTP site. NCBI Blast version 2.2.18 tblastn was used to compare the protein sequences against the Strawberry Genome Assembly version 8 scaffolds dynamically translated in all reading frames. Hits were divided into separate files with different significance levels.
|ExPASy SwissProt (Excel file)||fvesca_v1.0_vs_sprot.xls|
|NCBI nr (Excel file)||fvesca_v1.0_vs_nr.xls|
|ExPASy TrEMBL (Excel file)||fvesca_v1.0_vs_trembl.xls|
|Malus x domestica (v1.0) proteins (Excel file)||fvesca_v1.0_vs_apple.xls|
|TAIR10 (arabidopsis) proteins (Excel file)||fvesca_v1.0_vs_arabidopsis.xls|
|Prunus persica (peach) v1.0 proteins (Excel file)||fvesca_v1.0_vs_peach.xls|
|Vitis vinifera (grape) proteins (Excel file)||fvesca_v1.0_vs_grape.xls|
|Populus trichocarpa (poplar) v2.0 proteins (Excel file)||fvesca_v1.0_vs_poplar.xls|
|Uniref90 tblast alignments > 1e-100 and < 1e-120 (GFF3 file)||fvesca_v1.0-UniRef90_tblastn_e100.gff3.gz|
|Uniref90 tblast alignments > 1e-120 and < 1e-140 (GFF3 file)||fvesca_v1.0-UniRef90_tblastn_e120.gff3.gz|
|Uniref90 tblast alignments > 1e-140 and < 1e-160 (GFF3 file)||fvesca_v1.0-UniRef90_tblastn_e140.gff3.gz|
|Uniref90 tblast alignments > 1e-160 and < 1e-180 (GFF3 file)||fvesca_v1.0-UniRef90_tblastn_e160.gff3.gz|
|Uniref90 tblast alignments with 1e-180 or lower (GFF3 file)||fvesca_v1.0-UniRef90_tblastn_e180.gff3.gz|
The Rosaceae family Conserved Orthlogous Set (COS) markers have been aligend to the F. vesca v1.0 scaffolds
|Rosaceae family Conserved Orthologous Set (COS) markers||fvesca_v1.0_COSMarker-Rosaceae_family.gff3|
LTRs were identified using ltr_struc (McCarthy, E.M., and J.F. McDonald. 2003. LTR_STRUC: A novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362-367) and LTRharvest (D. Ellinghaus, S. Kurtz, and U. Willhoeft. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 2008, 9:18).
|LTRHarvest predicted repeats||fvesca_v1.0_repeats_ltrharvest.gff3.gz|
|LTR_STRUC predicted repeats||fvesca_v1.0_repeats_ltr_struc.gff3.gz|
F. vesca EST were pre-clustered onto scaffolds and then assembled using MIRA. Assembled clusters were then aligned to the scaffolds using GMAP. The single best match for each cluster was kept. The analysis was performed by Todd Mockler and Henry Priest at Oregon State University.
|Fragaria 454 ESTs aligned using GMAP (GFF3 file)||fvesca_v1.0_gmap-Fragaria_vesca-454ESTs.gff3.gz|
|Fragaria cDNA from Genbank (GFF3 file)||fvesca_v1.0_genbank-Fragaria-cDNA.gff3.gz|
|RefSeq mRNA Alignments||fvesca_v1.0_refseq.gff3.gz|
Fragaria iinumae 454 contigs (assembly 21-07-2009) were mapped to the Fragaria vesca reference scaffolds. The F. iinumae contigs are generally short due to low coverage.
|Fragaria iinumae alignments from 454 assembly (GFF3 file)||fvesca_v1.0_gmap-Fragaria_iinumae-strawberry.gff3.gz|
|Fragaria iinumae 454 assembly (FASTA file)||fragaria_iinumae_454AllContigs.fna.gz|
|Fragaria iinumae alignments to pseudomolecules (GFF3 file)||fvesca_v1.0-LG_nucmer-Fragaria_iinumae.gff3.gz|
|F. vesca Pawtuckaway fosmid alignments to pseudomolecles (GFF3 file)||fvesca_v1.0-LG_nucmer-Pawtuckaway-formids.gff3.gz|
Gene indices were obtained from The Gene Index Project (http://compbio.dfci.harvard.edu/tgi/). The following plant species gene indices were obtained: Apple (Malus domestica) v1.0 7-8-08 (MdGI), Arabidopsis thaliana v13.0 6-16-06 (AGI), Beet (Beta vulgaris) v2.0 5-19-08 (BvGI), Brassica napus v3.1 5-31-08 (BnGI), Cotton (Gossypium) v9.0 6-14-08 (CGI), Grape (Vitis vinifera) v6.0 7-30-08 VvGI), Sunflower (Helianthus annuus) v5.0 (HaGI), Lettuce (Lactuca sativa) v3.0 7-2-08 (LsGI), Medicago truncatula v9.0 v7-16-08 (MtGI), Onion (Allium cepa) v2.0 (OnGI), Orange (Citrus sinensis) v1.0 6-25-08 (CsGI), Peach (Prunus persica) v1.0 6-29-08 (PrpeGI), Pepper (Capsicum annuum) v3.0 7-17-08 (CaGI), Pinus v7.0 7-23-08 (PGI), Soybean (Glycine max) v13.0 7-11-08 (GmGI), Sugar Cane (Saccharum officinarum) v2.2 (SoGI), and Tomato (Solanum lycopersicum) v12.0 7-15-08 (LGI). Gene indices were mapped to the genome sequence scaffolds and unassembled contigs using gmap version 2007-09-28 (http://www.gene.com/share/gmap/; and Watanabe, 2005).
All assembly and annotation files are available for download by selecting the desired data type in the right-hand "Resources" side bar. Each data type page will provide a description of the available files and links do download.