|Analysis Name||Fragaria vesca Whole Genome v1.0 (build 8) Assembly & Annotation |
|Method||Celera Assembler (5.3) |
|Source||A combination of paired and unpaired 454, Illumina and SOLiD reads |
|Date performed||2010-12-26 |
About the Assembly
From this site you can browse and download the whole genome sequence, predicted gene models, functional annotations, comparative alignments and more from the Shulaev et al., 2010 published strawberry genome assembly v1.0 (build 8). Select a link in the Resources box for further details.
Shulaev, et. al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics. 43, 109-116. 2010
All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar. Each data type page will provide a description of the available files and links do download.
The assembly and annotation files below are from the assembly as presented by the Shulaev, et. al 2010 strawberry genome paper. The assembly was constrcuted from 454, Illumina and Sanger reads using the Celera assembler and was assembled by Arthur Delcher at the University of Maryland. The pseudomolecules are the final assembly (v1.0) of the chromosomal sequence. The pseudomolecule sequences were derived from ordered and oriented scaffolds. The pseudomolecules, scaffolds used to construct the pseudomolecules, and low quality degenerate scaffolds (not used in construction of the pseudomolecules) are available for download.
|Scaffolds (FASTA file)
|Scaffolds (GFF3 file)
|Degenerate scaffolds (FASTA file)
|Degenerate scaffolds (GFF3 file)
|Pseudomolecules (FASTA file)
|Pseudomolecules (GFF3 file)
|Mapping of scaffolds to pseudomolecules (GFF3 file)
Ab Initio GeneMark Predictions
Ab initio GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) dated 10-12-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506
Hybrid GeneMark Predictions
Hybrid GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) with EST data constraints released 16-11-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506.
|GeneMark hybrid genes (GFF3 file)
|Genemark hybrid genes mapped to pseudomolecules (GFF3 file)
|GeneMark hybrid CDS (FASTA file)
|GeneMark hybrid CDS w/ annotations (FASTA file)
|GeneMark hybrid proteins (FASTA file)
|GeneMark hybrid proteins w/ annotations (FASTA file)
|GeneMark ab initio genes (GFF3 file)
|GeneMark ab initio CDS (FASTA file)
|GeneMark ab initio proteins (FASTA file)
LTRs were identified using ltr_struc (McCarthy, E.M., and J.F. McDonald. 2003. LTR_STRUC: A novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362-367) and LTRharvest (D. Ellinghaus, S. Kurtz, and U. Willhoeft. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 2008, 9:18).
The Rosaceae family Conserved Orthlogous Set (COS) markers have been aligend to the F. vesca v1.0 scaffolds
|Rosaceae family Conserved Orthologous Set (COS) markers
Fragaria Transcript Alignments
This page contains several transcript alignments to the F. vesca v1.0 genome, including a 454 transcript data set, cDNA records and RefSeq from GenBank.
The F. vesca EST from 454 sequencing were pre-clustered onto scaffolds and then assembled using MIRA. Assembled clusters were then aligned to the scaffolds using GMAP. The single best match for each cluster was kept. The analysis was performed by Todd Mockler and Henry Priest at Oregon State University.
Gene Index Alignments
Gene indices were obtained from The Gene Index Project (http://compbio.dfci.harvard.edu/tgi/). The following plant species gene indices were obtained: Apple (Malus domestica) v1.0 7-8-08 (MdGI), Arabidopsis thaliana v13.0 6-16-06 (AGI), Beet (Beta vulgaris) v2.0 5-19-08 (BvGI), Brassica napus v3.1 5-31-08 (BnGI), Cotton (Gossypium) v9.0 6-14-08 (CGI), Grape (Vitis vinifera) v6.0 7-30-08 VvGI), Sunflower (Helianthus annuus) v5.0 (HaGI), Lettuce (Lactuca sativa) v3.0 7-2-08 (LsGI), Medicago truncatula v9.0 v7-16-08 (MtGI), Onion (Allium cepa) v2.0 (OnGI), Orange (Citrus sinensis) v1.0 6-25-08 (CsGI), Peach (Prunus persica) v1.0 6-29-08 (PrpeGI), Pepper (Capsicum annuum) v3.0 7-17-08 (CaGI), Pinus v7.0 7-23-08 (PGI), Soybean (Glycine max) v13.0 7-11-08 (GmGI), Sugar Cane (Saccharum officinarum) v2.2 (SoGI), and Tomato (Solanum lycopersicum) v12.0 7-15-08 (LGI). Gene indices were mapped to the genome sequence scaffolds and unassembled contigs using gmap version 2007-09-28 (http://www.gene.com/share/gmap/; and Watanabe, 2005).
|Allium cepa, onion (GFF3 file)
|Brassica napus, rapeseed
|Capsicum annuum, pepper
|Citrus sinensis, sweet orange
|Glycene max, soybean
|Helianthus annus, sunflower
|Lactuca sativa, lettuce
|Malus domestica, apple
|Saccharum officinarum, sugarcane
|Solanum lycopersicum, tomato
|Triticum aestivum, wheat
|Vitis vinifera, grape