Fragaria vesca Whole Genome v4.0.a2 (Re-annotation of v4.0.a1)
Yongping Li, Mengting Pi, Qi Gao, Zhongchi Liu & Chunying Kang.(2019). Updated annotation of the wild strawberry Fragaria vesca V4 genome. Horticulture Researchvolume 6, Article number: 61
The cultivated strawberry (Fragaria × ananassa, 8x) is an economically important crop worldwide. The diploid strawberry Fragaria vesca serves as an ideal model plant for cultivated strawberry as well as the Rosaceae family. F. vesca is the most widely distributed diploid Fragaria species naturally and is considered to be one of the progenitors of the cultivated strawberry. Due to these features, its genome was initially assembled using short reads (<300 bp) of DNA sequence from a fourth-generation inbred line of Hawaii (F. vesca ssp. vesca), called FvH4, and reassembled based on dense linkage maps of the North American diploid F. vesca ssp. bracteata, called Fvb. Recently, the F. vesca V4 genome, a near-complete genome with a contig N50 of approximately 7.9 million base pairs (Mb), was assembled using long reads generated by Pacific Biosciences (PacBio) from Hawaii. This high-quality genome provides a better reference for genomic and transcriptomic analyses of F. vesca. The v2.0.a annotation was mapped to the V4 genome and reran the pipeline with the previous datasets, including a total of 97 RNA-seq libraries generated from floral and fruit tissues at different developmental stages, as well as from seedlings, leaves, meristems, and roots. Combining these two types of results together, an updated annotation, v4.0.a2, including 34,007 protein-coding genes with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs) was obtained. Then, the newly added genes were carefully characterized. Additionally, the expression levels of all the genes across different floral and fruit tissue types are provided in the supplementary table. Moreover, a total of 84 known and 63 novel miRNAs were identified, and their targets were predicted. Overall, the new annotation and gene expression data provide valuable data resources for future studies.
Genome annotation facts and statistics
The new and improved annotation, v4.0.a2, for F. vesca genome V4. The new annotation has a total of 34,007 gene models with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs). In this v4.0.a2 annotation, gene models of 8,342 existing genes are modified, 9,029 new genes are added, and 10,176 genes possess alternatively spliced isoforms with an average of 1.90 transcripts per locus.
Homology of the Fragaria vesca genome v4.0.a2 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
All annotation files are available for download by selecting the desired data type in the left-hand side bar. Each data type page will provide a description of the available files and links do download..
The Fragaria vesca v4.0.a2 gene prediction files are available in FASTA and GFF3 formats.
Functional annotation for the Fragaria vesca genome v4.0.a2 are available for download below. The Fragaria vesca genome v4.0.a2 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).