Fragaria vesca Whole Genome v4.0.a1 Assembly & Annotation
Edger PP, VanBuren R, Colle M, Poorten TJ, Wai CM, Niederhuth CE, Alger EI, Ou S, Acharya CB, Wang J, Callow P, McKain MR, Shi J, Collier C, Xiong Z, Mower JP, Slovin JP, Hytönen T, Jiang N, Childs KL, Knapp SJ.2017. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience, gix124 13 December 2017
Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce 'platinum' quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule realtime sequencing from Pacific Biosciences (PacBio).
Genome annotation facts and statistics
This assembly has a contig N50 length of ~7.9 Mb, representing a ~300 fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to seven pseudomolecules using two sets of optical maps from Bionano Genomics. We obtained ~24.96 million base pairs (Mb) of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1,496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.
Homology of the Fragaria vesca v4.0.a1 transcript was determined by pairwise sequence comparison using the blastx algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2017-07) and 1e-6 for the Arabidoposis proteins (TAIR10), UniProt SwissProt (Release 2017-11), and UniProt TrEMBL (Release 2017-11) databases. The best hit reports are available for download in Excel format.
All annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar. Each data type page will provide a description of the available files and links do download. Alternatively, you can use the FTP repository for bulk download.
Fragaria vesca v4.0.a1
Functional annotation for the Fragaria vesca v4.0.a1 genome are available for download below. The Fragaria vesca transcripts were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).