Fragaria vesca Whole Genome v1.0 (build 8) Assembly & Annotation

Overview
Analysis NameFragaria vesca Whole Genome v1.0 (build 8) Assembly & Annotation
MethodCelera Assembler (5.3)
SourceA combination of paired and unpaired 454, Illumina and SOLiD reads
Date performed2010-12-26

About the Assembly
From this site you can browse and download the whole genome sequence, predicted gene models, functional annotations, comparative alignments and more from the Shulaev et al., 2010 published strawberry genome assembly v1.0 (build 8). Select a link in the Resources box for further details. 

 
Publications
Shulaev, et. al. The genome of woodland strawberry (Fragaria vesca).  Nature Genetics. 43, 109-116. 2010

 

Homology

Excel Reports
Best hit reports of blastp of Fragaria vesca genome v1.0 proteins versus various protein databases.  Results in Excel format.  These blast results are provided by the MainLab Bioinformatics group at Washington State University using protein sequences derived from the hybrid GeneMark gene models.

Uniref90 Alignments
Uniref90 release 15.6 consisting of 5,801,325 entries was obtained from the UniProt FTP site. NCBI Blast version 2.2.18 tblastn was used to compare the protein sequences against the Strawberry Genome Assembly version 8 scaffolds dynamically translated in all reading frames. Hits were divided into separate files with different significance levels.

Downloads

ExPASy SwissProt (Excel file) Fragaria_vesca_v1.0_vs_sprot.xlsx
NCBI nr (Excel file) Fragaria_vesca_v1.0_vs_nr.xlsx
ExPASy TrEMBL (Excel file) Fragaria_vesca_v1.0_vs_trembl.xlsx
Malus x domestica (v1.0) proteins (Excel file) Fragaria_vesca_v1.0_vs_apple.xlsx
TAIR10 (arabidopsis) proteins (Excel file) Fragaria_vesca_v1.0_vs_tair.xlsx
Prunus persica (peach) v1.0 proteins (Excel file) Fragaria_vesca_v1.0_vs_peach.xlsx
Vitis vinifera (grape) proteins (Excel file) Fragaria_vesca_v1.0_vs_grape.xlsx
Populus trichocarpa (poplar) v2.0 proteins (Excel file) Fragaria_vesca_v1.0_vs_poplar.xlsx
Uniref90 tblast alignments > 1e-100 and < 1e-120 (GFF3 file) fvesca_v1.0-UniRef90_tblastn_e100.gff3.gz
Uniref90 tblast alignments > 1e-120 and < 1e-140 (GFF3 file) fvesca_v1.0-UniRef90_tblastn_e120.gff3.gz
Uniref90 tblast alignments > 1e-140 and < 1e-160 (GFF3 file) fvesca_v1.0-UniRef90_tblastn_e140.gff3.gz
Uniref90 tblast alignments > 1e-160 and < 1e-180 (GFF3 file) fvesca_v1.0-UniRef90_tblastn_e160.gff3.gz
Uniref90 tblast alignments with 1e-180 or lower   (GFF3 file) fvesca_v1.0-UniRef90_tblastn_e180.gff3.gz

 

Downloads

All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar.  Each data type page will provide a description of the available files and links do download.

Assembly

The assembly and annotation files below are from the assembly as presented by the Shulaev, et. al 2010 strawberry genome paper. The assembly was constrcuted from 454, Illumina and Sanger reads using the Celera assembler and was assembled by Arthur Delcher at the University of Maryland.  The pseudomolecules are the final assembly (v1.0) of the chromosomal sequence. The pseudomolecule sequences were derived from ordered and oriented scaffolds.  The pseudomolecules, scaffolds used to construct the pseudomolecules, and low quality degenerate scaffolds (not used in construction of the pseudomolecules) are available for download.

Downloads

Scaffolds (FASTA file) fvesca_v1.0_scaffolds.fna.gz
Scaffolds (GFF3 file) fvesca_v1.0_scaffolds.gff3.gz
Degenerate scaffolds (FASTA file) fvesca_v1.0_scaffolds-degen.fna.gz
Degenerate scaffolds (GFF3 file) fvesca_v1.0_scaffolds-degen.gff3.gz
Pseudomolecules (FASTA file) fvesca_v1.0_pseudo.fna.gz
Pseudomolecules (GFF3 file) fvesca_v1.0_pseudo.gff3.gz
Mapping of scaffolds to pseudomolecules (GFF3 file) fvesca_v1.0_scaffolds2pseudo.gff3.gz

 

Gene Predictions

Ab Initio GeneMark Predictions
Ab initio GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) dated 10-12-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. [1]Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506

Hybrid GeneMark Predictions
Hybrid GeneMark gene prediction were generated by Mark Borodovsky, Paul Burns and Alexandre Lomsadze (Georgia Institute of Technology, Atlanta, Georgia, USA) with EST data constraints released 16-11-2009. The ab initio predictions for Fragaria vesca were created using the new 2.51 version of the GeneMark-ES program1 developed by Paul Burns. This program uses an iterative self-training algorithm to train a GMHMM gene model, which is used for gene prediction inside the iterative training as well as for the final prediction step. [1]Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506.

Downloads

GeneMark hybrid genes (GFF3 file) fvesca_v1.0_genemark_hybrid.gff3.gz
Genemark hybrid genes mapped to pseudomolecules (GFF3 file) fvesca_v1.0_genemark_hybrid2pseudo.gff3.gz
GeneMark hybrid CDS (FASTA file) fvesca_v1.0_genemark_hybrid.fna.gz
GeneMark hybrid CDS w/ annotations (FASTA file) fvesca_v1.0_genemark_hybrid.annotated.fna.gz
GeneMark hybrid proteins (FASTA file) fvesca_v1.0_genemark_hybrid.faa.gz
GeneMark hybrid proteins w/ annotations (FASTA file) fvesca_v1.0_genemark_hybrid.annotated.faa.gz
GeneMark ab initio genes (GFF3 file) fvesca_v1.0_genemark_abinitio.gff3.gz
GeneMark ab initio CDS (FASTA file) fvesca_v1.0_genemark_abinitio.fna.gz
GeneMark ab initio proteins (FASTA file) fvesca_v1.0_genemark_abinitio.faa.gz

 

Functional Annotations

Downloads

InterPro motifs for GeneMark ab initio derived transcripts                 fvesca_v1.0_ab_initio_interproscan_data.txt.gz
InterPro motifs for GeneMark hybrid derived transcripts   fvesca_v1.0_hybrid_interproscan_data.txt.gz
InterPro GO for GeneMark ab initio derived transcripts    fvesca_v1.0_ab_initio_interpro_go_data.txt.gz
InterPro GO for GeneMark hybrid derived transcripts  fvesca_v1.0_hybrid_interpro_go_data.txt.gz

 

Predicted Repeats

LTRs were identified using ltr_struc (McCarthy, E.M., and J.F. McDonald. 2003. LTR_STRUC: A novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362-367) and LTRharvest (D. Ellinghaus, S. Kurtz, and U. Willhoeft. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 2008, 9:18).

Downloads

LTRHarvest predicted repeats                    fvesca_v1.0_repeats_ltrharvest.gff3.gz
LTR_STRUC predicted repeats fvesca_v1.0_repeats_ltr_struc.gff3.gz

 

 

Markers

The Rosaceae family Conserved Orthlogous Set (COS) markers have been aligend to the F. vesca v1.0 scaffolds

Downloads

Rosaceae family Conserved Orthologous Set (COS) markers   fvesca_v1.0_COSMarker-Rosaceae_family.gff3               

 

Chloroplast

The choloroplast genome was  constructed from Illumina reads and was annotated using DGMA by Aaron Liston at Oregon State University.

Downloads

Chloroplast genes (GFF3 file) fvesca_v1.0_chloroplast.gff3.gz
Chloroplast genes (Genbank file) fvesca_v1.0_chloroplast.gb
Chloroplast (FASTA file) fvesca_v1.0_chloroplast.fna

 

Fragaria Transcript Alignments

This page contains several transcript alignments to the F. vesca v1.0 genome, including a 454 transcript data set, cDNA records and RefSeq from GenBank.

The F. vesca EST from 454 sequencing were pre-clustered onto scaffolds and then assembled using MIRA.  Assembled clusters were then aligned to the scaffolds using GMAP. The single best match for each cluster was kept. The analysis was performed by Todd Mockler and Henry Priest at Oregon State University.

Downloads

Fragaria 454 ESTs aligned using GMAP (GFF3 file)       fvesca_v1.0_gmap-Fragaria_vesca-454ESTs.gff3.gz   
Fragaria cDNA from Genbank (GFF3 file)   fvesca_v1.0_genbank-Fragaria-cDNA.gff3.gz
RefSeq mRNA Alignments fvesca_v1.0_refseq.gff3.gz   

 

Comparative Alignments

Fragaria iinumae 454 contigs (assembly 21-07-2009) were mapped to the Fragaria vesca reference scaffolds. The F. iinumae contigs are generally short due to low coverage. 

Downloads

Fragaria iinumae alignments from 454 assembly (GFF3 file) fvesca_v1.0_gmap-Fragaria_iinumae-strawberry.gff3.gz
Fragaria iinumae 454 assembly (FASTA file) fragaria_iinumae_454AllContigs.fna.gz
Fragaria iinumae alignments to pseudomolecules (GFF3 file) fvesca_v1.0-LG_nucmer-Fragaria_iinumae.gff3.gz
F. vesca Pawtuckaway fosmid alignments to pseudomolecles (GFF3 file) fvesca_v1.0-LG_nucmer-Pawtuckaway-formids.gff3.gz

 

Gene Index Alignments

Gene indices were obtained from The Gene Index Project (http://compbio.dfci.harvard.edu/tgi/). The following plant species gene indices were obtained: Apple (Malus domestica) v1.0 7-8-08 (MdGI), Arabidopsis thaliana v13.0 6-16-06 (AGI), Beet (Beta vulgaris) v2.0 5-19-08 (BvGI), Brassica napus v3.1 5-31-08 (BnGI), Cotton (Gossypium) v9.0 6-14-08 (CGI), Grape (Vitis vinifera) v6.0 7-30-08 VvGI), Sunflower (Helianthus annuus) v5.0 (HaGI), Lettuce (Lactuca sativa) v3.0 7-2-08 (LsGI), Medicago truncatula v9.0 v7-16-08 (MtGI), Onion (Allium cepa) v2.0 (OnGI), Orange (Citrus sinensis) v1.0 6-25-08 (CsGI), Peach (Prunus persica) v1.0 6-29-08 (PrpeGI), Pepper (Capsicum annuum) v3.0 7-17-08 (CaGI), Pinus v7.0 7-23-08 (PGI), Soybean (Glycine max) v13.0 7-11-08 (GmGI), Sugar Cane (Saccharum officinarum) v2.2 (SoGI), and Tomato (Solanum lycopersicum) v12.0 7-15-08 (LGI). Gene indices were mapped to the genome sequence scaffolds and unassembled contigs using gmap version 2007-09-28 (http://www.gene.com/share/gmap/; and Watanabe, 2005).

Downloads

Allium cepa, onion (GFF3 file) fvesca_v1.0_gmap-Allium_cepa-onion.gff3
Arabidopsis thaliana fvesca_v1.0_gmap-Arabidopsis_thaliana.gff3
Beta_vulgaris, beet fvesca_v1.0_gmap-Beta_vulgaris-beet.gff3
Brassica napus, rapeseed fvesca_v1.0_gmap-Brassica_napus-rapeseed.gff3
Capsicum annuum, pepper fvesca_v1.0_gmap-Capsicum_annuum-pepper.gff3
Citrus sinensis, sweet orange fvesca_v1.0_gmap-Citrus_sinensis-orange.gff3
Glycene max, soybean fvesca_v1.0_gmap-Glycine_max-soybean.gff3
Gossypium fvesca_v1.0_gmap-Gossypium.gff3
Helianthus annus, sunflower fvesca_v1.0_gmap-Helianthus_annuus-sunflower.gff3
Lactuca sativa, lettuce fvesca_v1.0_gmap-Lactuca_sativa-lettuce.gff3
Malus domestica, apple fvesca_v1.0_gmap-Malus_domestica-apple.gff3
Medicago truncatula fvesca_v1.0_gmap-Medicago_truncatula.gff3
Pinus, pine fvesca_v1.0_gmap-Pinus-pine.gff3
Prunus persica,peach fvesca_v1.0_gmap-Prunus_persica-peach.gff3
Saccharum officinarum, sugarcane fvesca_v1.0_gmap-Saccharum_officinarum-sugarcane.gff3
Solanum lycopersicum, tomato fvesca_v1.0_gmap-Solanum_lycopersicum-tomato.gff3
Triticum aestivum, wheat fvesca_v1.0_gmap-Triticum_aestivum-wheat.gff3
Vitis vinifera, grape fvesca_v1.0_gmap-Vitis_vinifera-grape.gff3.gz