Fragaria vesca Whole Genome v2.0.a2 (Re-annotation of v2.0.a1)
Yongping Li, Wei Wei, Feng Jia, Huifeng Luo, Mengting Pi, Zhongchi Liu, and Chunying Kang (2017). Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina- and SMRT-based RNA-seq datasets. DNA Research, doi: 10.1093/dnares/dsx038.
The wild diploid strawberry species Fragaria vesca is an ideal model system of cultivated strawberry (Fragaria x ananassa, octoploid) and other Rosaceae family crops. The genome of F. vesca was first sequenced in 2011 from a fourth-generation inbred line of Hawaii4 (F. vesca ssp. vesca) solely by second generation short read technologies, and its genome annotation v1.1 only contains protein-coding genes derived from ab initio gene predictions. Then, an updated annotation version (v1.1.a2) became available using 50 RNA-seq libraries generated from 25 different fruit tissue types to improve the annotation accuracy of protein-coding genes. In the meantime, a refined assembly of the F. vesca reference genome, named Fvb, was generated based on dense linkage maps of the North American diploid F. vesca ssp. bracteata 9. Compared to the original genome FvH4, Fvb features many translocations and inversions. Although Fvb still contains a large number of gaps that could be closed by utilizing long read sequencing technology, it has a much shorter unanchored pseudo-chromosome. The annotation for Fvb (v2.0.a1) mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries.
Genome annotation facts and statistics
Annotation v2.0.a2 of the genome Fvb contains a final set of 33,538 protein-coding loci with 50,738 transcripts. The Locus IDs for 30,012 identical or modified genes stay the same between the older annotation v2.0.a1 and this annotation (v2.0.a2), while 3,525 genes (novel or split) were newly numbered using the same nomenclature as previous annotations (ie. GeneXXXXX). Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5’ and/or 3’ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation.
All annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar. Each data type page will provide a description of the available files and links do download. Alternatively, you can use the FTP repository for bulk download.
The Fragaria vesca v2.0.a2 gene prediction files are available in FASTA and GFF3 formats.