Fragaria vesca Whole Genome v4.0.a2 (Re-annotation of v4.0.a1)

MethodAn optimized annotation pipeline mainly using MAKER2 and BUSCO
SourceFragaria vesca Whole Genome v4.0.a2 Assembly & Annotation
Date performed2019-05-09


Yongping Li, Mengting Pi, Qi Gao, Zhongchi Liu & Chunying Kang.(2019). Updated annotation of the wild strawberry Fragaria vesca V4 genome. Horticulture Researchvolume 6, Article number: 61


The cultivated strawberry (Fragaria × ananassa, 8x) is an economically important crop worldwide. The diploid strawberry Fragaria vesca serves as an ideal model plant for cultivated strawberry as well as the Rosaceae family. F. vesca is the most widely distributed diploid Fragaria species naturally and is considered to be one of the progenitors of the cultivated strawberry. Due to these features, its genome was initially assembled using short reads (<300 bp) of DNA sequence from a fourth-generation inbred line of Hawaii (F. vesca ssp. vesca), called FvH4, and reassembled based on dense linkage maps of the North American diploid F. vesca ssp. bracteata, called Fvb. Recently, the F. vesca V4 genome, a near-complete genome with a contig N50 of approximately 7.9 million base pairs (Mb), was assembled using long reads generated by Pacific Biosciences (PacBio) from Hawaii. This high-quality genome provides a better reference for genomic and transcriptomic analyses of F. vesca.  The v2.0.a annotation was mapped to the V4 genome and reran the pipeline with the previous datasets, including a total of 97 RNA-seq libraries generated from floral and fruit tissues at different developmental stages, as well as from seedlings, leaves, meristems, and roots. Combining these two types of results together, an updated annotation, v4.0.a2, including 34,007 protein-coding genes with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs) was obtained. Then, the newly added genes were carefully characterized. Additionally, the expression levels of all the genes across different floral and fruit tissue types are provided in the supplementary table. Moreover, a total of 84 known and 63 novel miRNAs were identified, and their targets were predicted. Overall, the new annotation and gene expression data provide valuable data resources for future studies.


Genome annotation facts and statistics

The new and improved annotation, v4.0.a2, for F. vesca genome V4. The new annotation has a total of 34,007 gene models with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs). In this v4.0.a2 annotation, gene models of 8,342 existing genes are modified, 9,029 new genes are added, and 10,176 genes possess alternatively spliced isoforms with an average of 1.90 transcripts per locus. 


All annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar.  Each data type page will provide a description of the available files and links do download.  Alternatively, you can use the FTP repository for bulk download.

Gene Predictions

The Fragaria vesca v4.0.a2 gene prediction files are available in FASTA and GFF3 formats.


Genes (GFF3 file) Fragaria_vesca_v4.0.a2.genes.gff3.gz
CDS (FASTA file) Fragaria_vesca_v4.0.a2.cds.fa.gz
Proteins (FASTA file) Fragaria_vesca_v4.0.a2.proteins.fa.gz