Fragaria vesca Whole Genome v4.0.a1 Assembly & Annotation

Overview
Analysis NameFragaria vesca Whole Genome v4.0.a1 Assembly & Annotation
MethodCanu Assembler
SourcePacific Biosciences reads
Date performed2018-01-11

Publication

Edger PP, VanBuren R, Colle M, Poorten TJ, Wai CM, Niederhuth CE, Alger EI, Ou S, Acharya CB, Wang J, Callow P, McKain MR, Shi J, Collier C, Xiong Z, Mower JP, Slovin JP, Hytönen T, Jiang N, Childs KL, Knapp SJ.2017. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience, gix124 13 December  2017

Background

Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce 'platinum' quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule realtime sequencing from Pacific Biosciences (PacBio). 

Genome annotation facts and statistics

This assembly has a contig N50 length of ~7.9 Mb, representing a ~300 fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to seven pseudomolecules using two sets of optical maps from Bionano Genomics. We obtained ~24.96 million base pairs (Mb) of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1,496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.

Homology

Homology of the Fragaria vesca v4.0.a1 transcript was determined by pairwise sequence comparison using the blastx algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2017-07) and 1e-6  for the Arabidoposis proteins (TAIR10), UniProt SwissProt (Release 2017-11), and UniProt TrEMBL (Release 2017-11) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Fragaria vesca v4.0.a1 transcripts with NCBI nr homologs (EXCEL file) Fragaria_vesca_v4.0.a1_vs_nr.xlsx
Fragaria vesca v4.0.a1 transcripts with NCBI nr (FASTA file) Fragaria_vesca_v4.0.a1_vs_nr_hit.fasta
Fragaria vesca v4.0.a1 transcripts without NCBI nr (FASTA file) Fragaria_vesca_v4.0.a1_vs_nr_noHit.fasta
Fragaria vesca v4.0.a1 transcripts with  arabidopsis (TAIR10) homologs (EXCEL file) Fragaria_vesca_v4.0.a1_vs_tair.xlsx
Fragaria vesca v4.0.a1 transcripts with  arabidopsis (TAIR10) (FASTA file) Fragaria_vesca_v4.0.a1_vs_tair_hit.fasta
Fragaria vesca v4.0.a1 transcripts without  arabidopsis (TAIR10) (FASTA file) Fragaria_vesca_v4.0.a1_vs_tair_noHit.fasta
Fragaria vesca v4.0.a1 transcripts with ExPASy SwissProt homologs (EXCEL file) Fragaria_vesca_v4.0.a1_vs_swissprot.xlsx
Fragaria vesca v4.0.a1 transcripts with ExPASy SwissProt (FASTA file) Fragaria_vesca_v4.0.a1_vs_swissprot_hit.fasta
Fragaria vesca v4.0.a1 transcripts without ExPASy SwissProt (FASTA file) Fragaria_vesca_v4.0.a1_vs_swissprot_noHit.fasta
Fragaria vesca v4.0.a1 transcripts with ExPASy TrEMBL homologs (EXCEL file) Fragaria_vesca_v4.0.a1_vs_trembl.xlsx
Fragaria vesca v4.0.a1 transcripts with ExPASy TrEMBL (FASTA file) Fragaria_vesca_v4.0.a1_vs_trembl_hit.fasta
Fragaria vesca v4.0.a1 transcripts without ExPASy TrEMBL (FASTA file) Fragaria_vesca_v4.0.a1_vs_trembl_noHit.fasta

 

Downloads

All annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links do download.

Assembly

Downloads

Pseudomolecule (FASTA file) Fragaria vesca v4.0.a1.fasta.gz

 

Gene Predictions

Downloads

mRNA sequences (FASTA file) Fragaria vesca v4.0.a1_CDs.fasta.gz
Protein sequences  (FASTA file) Fragaria vesca v4.0.a1_prot.fasta.gz
Transposable element sequences (FASTA file) Fragaria vesca v4.0.a1_TE_Library.fasta.gz
Gene models (GFF3 file) Fragaria vesca v4.0.a1_gene_models.gff3.gz

 

Functional Analysis

Fragaria vesca v4.0.a1

Functional annotation for the Fragaria vesca v4.0.a1 genome are available for download below. The Fragaria vesca transcripts were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

InterPro Domains for Fragaria vesca v4.0.a1 transcripts (EXCEL file) Fragaria_vesca_v4.0.a1_IRP.xlsx
Gene Ontology annotations for Fragaria vesca v4.0.a1 transcripts  (EXCEL file) Fragaria_vesca_v4.0.a1_GO.xlsx
Fragaria vesca v4.0.a1 transcripts mapped to KEGG Pathways transcripts (EXCEL file) Fragaria_vesca_v4.0.a1_KEGG_pathway.xlsx
Fragaria vesca v4.0.a1 transcripts mapped to KEGG Orthologs transcripts (EXCEL file) Fragaria_vesca_v4.0.a1_KEGG_ortholog.xlsx