Fragaria x ananassa Camarosa Genome Assembly v1.0 & Annotation v1.0.a1

Overview
Analysis NameFragaria x ananassa Camarosa Genome Assembly v1.0 & Annotation v1.0.a1
MethodIllumina, 10X Genomics, and Pacific Biosciences
SourceFragaria x ananassa Camarosa Genome Assembly v1.0.a1
Date performed2019-02-28

Publication

Edger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, Smith RD, Teresi SJ, Nelson ADL, Wai CM, Alger EI, Bird KA, Yocca AE, Pumplin N, Ou S, Ben-Zvi G, Brodt A, Baruch K, Swale T, Shiue L, Acharya CB, Cole GS, Mower JP, Childs KL, Jiang N, Lyons E, Freeling M, Puzey JR, Knapp SJ. Origin and evolution of the octoploid strawberry genome. Nature genetics. 2019 Feb 25.

 

About the Assembly

Overview

A near-complete chromosome-scale assembly for cultivated octoploid strawberry (Fragaria × ananassa).

 

Sequencing, Assembly, and Annotation

The genome of the cultivar 'Camarosa' was sequenced using a combination of short- and long-read approaches, including Illumina (San Diego, CA), 10X Genomics (Pleasanton, CA), and Pacific Biosciences (PacBio; Menlo Park, CA), totaling 615-fold coverage of the genome. Illumina (455-fold coverage) and 10X Genomics (117-fold coverage) data were assembled and scaffolded using the software package DenovoMAGIC3 (NRGene, Nes Ziona, Israel). The genome was further scaffolded to chromosome-scale using Hi-C data (401-fold coverage) in combination with the HiRise pipeline (Dovetail, Santa Cruz, CA), and gap-filled with 43-fold coverage error corrected PacBio reads using PBJelly. The total length of the final assembly is 805,488,706bp distributed across 28 chromosome-level pseudomolecules, representing ~99% of the estimated genome size based on flow cytometry measurements. A genetic map for F. x ananassa was used to correct any mis-assemblies and comparisons to F. vesca to identify homoeologous chromosomes.

108,087 protein-coding genes were annotated along with 30,703 long non-coding RNA (lncRNA) genes, which is divided up into 15,621 long intergenic ncRNAs (lincRNAs), 9,265 antisense overlapping transcripts (or AOT-lncRNAs), and 5,817 sense overlapping transcripts (or SOT123 lncRNAs). Gene annotation and genome assembly quality were evaluated using the Benchmarking Universal Single-Copy Orthologs V2 (BUSCO) method. Most (99.17%) of the 1,440 core genes in the embryophyta dataset were identified in the annotation, supporting a high-quality genome assembly. The repetitive components of the nuclear genome was annotated using a custom repeat library approach, including DNA transposons, long-terminal-repeat retrotransposons (LTR-RTs; e.g., Copia and Gypsy), and non-LTR retrotransposons. TE related sequences make up ~36% of the total genome assembly, with LTR-RT being the most abundant (~28%). The plastid and mitochondrial genomes were also assembled, annotated, and verified for completeness.

Homology

Homology of the Fragaria x ananassa Camarosa genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (TAIR10), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Fragaria x ananassa v1.0 proteins with NCBI nr homologs (EXCEL file) F_x_ananassa_v1.0_vs_nr.xlsx.gz
Fragaria x ananassa v1.0 proteins with NCBI nr (FASTA file) F_x_ananassa_v1.0_vs_nr_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without NCBI nr (FASTA file) F_x_ananassa_v1.0_vs_nr_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (TAIR10) homologs (EXCEL file) F_x_ananassa_v1.0_vs_arabidopsis.xlsx.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (TAIR10) (FASTA file) F_x_ananassa_v1.0_vs_arabidopsis_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without arabidopsis (TAIR10) (FASTA file) F_x_ananassa_v1.0_vs_arabidopsis_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with SwissProt homologs (EXCEL file) F_x_ananassa_v1.0_vs_swissprot.xlsx.gz
Fragaria x ananassa v1.0 proteins with SwissProt (FASTA file) F_x_ananassa_v1.0_vs_swissprot_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without SwissProt (FASTA file) F_x_ananassa_v1.0_vs_swissprot_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with TrEMBL homologs (EXCEL file) F_x_ananassa_v1.0_vs_trembl.xlsx.gz
Fragaria x ananassa v1.0 proteins with TrEMBL (FASTA file) F_x_ananassa_v1.0_vs_trembl_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without TrEMBL (FASTA file) F_x_ananassa_v1.0_vs_trembl_noHit.fasta.gz

 

Download

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.

Assembly

The Fragaria x ananassa Camarosa v1.0 genome assembly file is available in FASTA format.

Downloads

Fragaria x ananassa Camarosa genome assembly v1.0 (FASTA file)  F_ana_Camarosa_6-28-17_hardmasked.fasta.gz
Fragaria x ananassa Camarosa genome assembly v1.0 (FASTA file)  F_ana_Camarosa_6-28-17_unmasked.fasta.gz
Gene Predictions

The Fragaria x ananassa Camarosa Genome v1.0.a1 genome gene prediction files:

Downloads

Long intergenic ncRNAs (lincRNAs) (GTF file) F_ana_lincRNAs_masked_transcripts_removed.gtf.gz
Predicted Genes  (GFF3 file) Fxa_v1.2_makerStandard_MakerGenes_woTposases.gff.gz
Tanscript sequences (FASTA file) Fxa_v1.2_makerStandard_transcripts_woTposases.fasta.gz
Protein sequences (FASTA file) Fxa_v1.2_makerStandard_proteins_woTposases.fasta.gz
Functional Analysis

Functional annotation for the Fragaria x ananassa Camarosa genome v1.0.a1 are available for download below. The Fragaria x ananassa Camarosa genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan F_x_ananassa_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan F_x_ananassa_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs F_x_ananassa_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways F_x_ananassa_v1.0_KEGG-pathways.xlsx.gz

 

The following functional annotaions for Fragaria x ananassa Camarosa Genome v1.0.a1 were provided by the original institute:

Downloads

New GeneID FxaC_newGeneID.txt.gz
InterproScan Results Fxa_v1.2_makerStandard_proteins_iprscan.txt.gz
PFam with new GeneID Fxa_v1.2_newGeneIDs_maxPfam.txt.gz

 

Repeats

The repetitive components of the nuclear genome was annotated using a custom repeat library approach, including DNA transposons, long-terminal-repeat retrotransposons (LTR-RTs; e.g., Copia and Gypsy), and non-LTR retrotransposons. TE related sequences make up ~36% of the total genome assembly, with LTR-RT being the most abundant (~28%).

Downloads

Fragaria x ananassa Camarosa v1.0 Repeats (FASTA file) Fxa_allRepeats.lib.fa.gz