Fragaria x ananassa FL15.89-25 Genome v1.0 Assembly & Annotation

Overview
Analysis NameFragaria x ananassa FL15.89-25 Genome v1.0 Assembly & Annotation
MethodHifiasm in trio binning mode (0.11)
SourcePacbio HiFi for FL15.89-25, Illumina for parents
Date performed2022-06-15

Welcome to the genome of Fragaria ×ananassa 'FL15.89-25'
'FL15.89-25' is a University of Florida (UF) breeding accession. The two parents of 'FL15.89-25' are 'Florida Beauty' and 'FL12.115-10'. It is also a descendant of 'Mara des Bois' after three generations of back crossing to UF elite materials.
The genome is haplotype-phased, meaning there are seperate files for either haplotype.
The prefix Bea is the haplotype passed down from 'Florida Beauty'. Otherwise, the prefix F12 represents the inheritance from 'FL12.115-10'.
Publication

Fan Z, Tieman DM, Knapp SJ, Zerbe P, Famula R, Barbey CR, Folta KM, Amadeu RR, Lee M, Oh Y, Lee S, Whitaker VM. A multi-omics framework reveals strawberry flavor genes and their regulatory elements.. The New phytologist. 2022 Aug 02. doi: 10.1111/nph.18416. (GDR | Journal)

Genome evaluation
The genome of ‘FL15.89-25’ was assembled into 1480 and 672 phased contigs with N50 of 12.8 Mb and 12.4 Mb, respectively, with similar contiguity to other recent high-quality octoploid strawberry genomes. A Kmer-based approach revealed 97.1% and 99.2% completeness for the haploid assemblies based on parental Illumina short reads, which was corroborated by 98.1% and 98% completeness of the BUSCO eudicots odb10 genes. The phasing quality was evaluated by parental specific Kmers; the average switching error and hamming error were 0.19% and 0.18% for the F12 haploid assembly, comparable to phased genomes in other species. The phased contigs were scaffolded into pseudochromosomes based on alignment to the ‘Camarosa’ reference genome, with 96.0% (795.1 Mb) and 92.8% (778.6 Mb) of phased contigs placed on 28 pseudochromosomes for the respective F12 and Bea haploid assemblies, consistent with previous flow cytometry estimations (720 Mb / 813 Mb)59,81. There were only 88 and 79 gaps in the final scaffolds, averaging 3.14 and 2.82 per chromosome for the respective F12 and Bea assemblies. Scaffolding quality was evaluated by a linkage map built for ‘FL15.89-25’, with 98.3% and 99.1% of 1676 SNPs were assigned to the correct chromosomes for the F12 and Bea assemblies, respectively. A Hi-C contact map built using public Hi-C data from F.×ananassa also validated the accuracy of ordering and assignment of the scaffolding. Alignment between two haplotypes revealed high collinearity between haplotypes. 

Materials and methods:
Genome assembly 
Fragaria ×ananassa ‘FL15.89-25’ carrying multiple favorable alleles of flavor genes was selected for sequencing. High molecular weight DNA was extracted from etiolated leaf tissue. Sequencing was performed by high-fidelity (HiFi) long-read sequencing on the Pacbio Sequel 2 platform. gDNA was sheared to ~17kb average for the insert size. The library was prepared with 6 ug of sheared gDNA size-selected by Blue Pippin to enrich large fragments and remove fragments below 8kb. The size-selected library was sequenced with two 8M SMRT cells using sequencing chemistry v2 and polymerase v2.0. Two SMRT cells yielded a total of 31.1 Gb HiFi reads with an average read length of 15.2 kb. The parents ‘Florida Beauty’ (female) and ‘FL12.115.10’ (male) were sequenced with Illumina NovaSeq 150bp pair-end. Total lengths of 35.1 and 33.9 Gb short-read data were obtained for ‘Florida Beauty’ and ‘FL12.115.10’, respectively.
The de novo trio-binning assembly was built using Hifiasm version 0.11 coupled with Yak version 0.1 with the parameter “-D10”, provided with short reads of parents and HiFi reads from ‘FL15.89-25’. One incorrectly phased contig was identified in the phasing evaluation and visualized in Bandage version 0.8.1. The mis-phased contig was divided at the break point and reassigned to the correct haplotype assembly. Pseudochromosomes were constructed according to a reference-based approach using Ragtag version 1.0 with parameters “-C -f 10000 --remove-small” based on the ‘Camarosa’ reference genome. The unscaffolded contigs were concatenated into chr0.  
Genome annotation
A repeat library was constructed using EDTA version 1.9.0 with “—sensitive 1” to allow RepeatModeler search to identify remaining TEs. The TE annotation library was generated by EDTA in a separate run. TE regions of both haploid assemblies were masked by ReapeatMasker version 4.1.1 provided with the repeat library. Protein-coding genes were annotated following the MAKER-P annotation pipeline. In the initial run, MAKER integrated transcript and protein evidence. Transcript evidence included the Fragaria ×ananassa GDR RefTrans V1 (https://www.rosaceae.org/analysis/230) and non-overlapping transcripts assembled from over 20 genotypes. RNAseq reads were first cleaned using Trimmomatic version 0.39 and mapped to the new assemblies without the gene annotation file by STAR version 2.7.6a. A unified set of transcripts was assembled by PsiCLASS version 1.0.1. The curated protein database for tracheophyta was downloaded from UniProt (https://www.uniprot.org/), and transposases were filtered out. In the sequential iterative runs, ab initio gene predictors SNAP and Augustus were iteratively trained and used for gene prediction. Functional annotations were assigned by UniProt/Swiss-Prot protein database using iprscan version 5.50. The final set of annotated genes were either supported by the evidence (AED < 1.0) or encoding a Pfam domain. KEGG K-numbers for annotated genes were assigned by KofamKOALA.  


If you have any questions, please contact fanzhen@ufl.edu.

Homology

Homology of the Fragaria x ananassa FL15.89-25 Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Fragaria x ananassa v1.0 proteins with NCBI nr homologs (EXCEL file) fxananassa_FL15.89-25_v1.0_vs_nr.xlsx.gz
Fragaria x ananassa v1.0 proteins with NCBI nr (FASTA file) fxananassa_FL15.89-25_v1.0_vs_nr_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without NCBI nr (FASTA file) fxananassa_FL15.89-25_v1.0_vs_nr_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) fxananassa_FL15.89-25_v1.0_vs_arabidopsis.xlsx.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (Araport11) (FASTA file) fxananassa_FL15.89-25_v1.0_vs_arabidopsis_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without arabidopsis (Araport11) (FASTA file) fxananassa_FL15.89-25_v1.0_vs_arabidopsis_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with SwissProt homologs (EXCEL file) fxananassa_FL15.89-25_v1.0_vs_swissprot.xlsx.gz
Fragaria x ananassa v1.0 proteins with SwissProt (FASTA file) fxananassa_FL15.89-25_v1.0_vs_swissprot_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without SwissProt (FASTA file) fxananassa_FL15.89-25_v1.0_vs_swissprot_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with TrEMBL homologs (EXCEL file) fxananassa_FL15.89-25_v1.0_vs_trembl.xlsx.gz
Fragaria x ananassa v1.0 proteins with TrEMBL (FASTA file) fxananassa_FL15.89-25_v1.0_vs_trembl_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without TrEMBL (FASTA file) fxananassa_FL15.89-25_v1.0_vs_trembl_noHit.fasta.gz

 

Assembly

The Fragaria x ananassa FL15.89-25 Genome v1.0 assembly files are available in FASTA format.

Downloads

Chromosomes (FASTA file) fxananassa_F12_v1.0.fasta.gz
Chromosomes (FASTA file) fxananassa_Bea_v1.0.fasta.gz

 

Gene Predictions

The Fragaria x ananassa FL15.89-25 Genome v1.0 gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences (FASTA file) fxananassa_F12_v1.0.proteins.fasta.gz
CDS (FASTA file) fxananassa_F12_v1.0.cds.fasta.gz
Genes (GFF3 file) fxananassa_F12_v1.0.genes.gff3.gz
Protein sequences (FASTA file) fxananassa_Bea_v1.0.proteins.fasta.gz
CDS (FASTA file) fxananassa_Bea_v1.0.cds.fasta.gz
Genes (GFF3 file) fxananassa_Bea_v1.0.genes.gff3.gz

 

Functional Analysis

Functional annotation for the Fragaria x ananassa Genome v1.0 are available for download below. The Fragaria x ananassa FL15.89-25 Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan fxananassa_FL15.89-25_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan fxananassa_FL15.89-25_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs fxananassa_FL15.89-25_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways fxananassa_FL15.89-25_v1.0_KEGG-pathways.xlsx.gz

 

Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Fragaria x ananassa FL15.89-25 genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

 

Fragaria x ananassa GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 fxananassa_FL15.89-25_v1.0_rubus_GDR_reftransV2
Fragaria_x_ananassa GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 fxananassa_FL15.89-25_v1.0_pyrus_GDR_reftransV1