Fragaria x ananassa 'Florida Brilliance' Genome v1.0 Assembly & Annotation

Analysis Name: Fragaria x ananassa 'Florida Brilliance' Genome v1.0 Assembly & Annotation
MethodSALSA2 (na)
SourceFaFB2 HiFi and Hi-C reads
Date performed2022-09-19


Han et al. Telomere-to-telomere Haplotype-phased Assembly of Octoploid Strawberry (Fragaria × anananassa) Without Parental Information Using HiFi Long Reads and Hi-C data. To be submitted

Trio binning is the first algorithm to generate a haplotype-resolved assembly, however, the requirement of parental data often limits the haplotype-phased genome assembly in practice. Recently, a new algorithm combining PacBio HiFi reads and Hi-C chromatin interaction data generated fully haplotype-phased genome assembly without parental data.
Without parental sequencing, we present first telomere-to-telomere octoploid strawberry genome assembly consisting of two haploid assemblies (phased-1 and phased-2).

Genome facts and statistics
The phase-1 assembly contained 3,716 contigs with an N50 of 23.7 Mb, and the phased-2 assembly contained 1,226 contigs with an N50 of 26.7 Mb. Only fifteen contigs accounted for 50% of phased-2 assembly, indicating that a contigs corresponds to a chromosome. In addition, largest contig size in phased-1 and phased-2 genome assemblies were over 36 Mb. Before scaffolding, the Benchmarking Universal Single-Copy Orthologs (BUSCO) scores were 99.2% in phased-1 assembly and 99.1% in phased-2 assembly indicating qualified initial assembly. Comparison of the full assembly to whole genome sequencing HiFi reads of ‘Florida Brilliance’ using Merqury showed very high base accuracy (QV>69.8), indicating 99.99999% of HiFi reads were detected on the combined phased-1 and 2 contigs.
We observed 99.1% complete gene models with a majority (96.6%) of the duplicated complete gene models in both phased-1 and phased-2 genome assembly. The final assembly of ‘Florida Brilliance’ consisted of 784.9 Mb and 781.0 Mb in phased-1 and phased-2 assembly. All 56 pseudo-chromosomes from phased-1 and phased-2 assembly contained putative telomere sequences at the 5’ and/or 3’ ends.



The Fragaria x ananassa Florida Brilliance v1.0 assembly files files are available in GFF3 and FASTA formats.


Chromosomes (non-masked)(FASTA file) FaFB_v1.0.fasta.gz
Chromosomes (hard-masked)(FASTA file) FaFB_masked_v1.0.fasta.gz
Chromosomes (soft-masked)(FASTA file) FaFB_soft_masked_v1.0.fasta.gz
Repeats (GFF3 file) FaFB_v1.0.repeats.gff3.gz
Repeats (FASTA file) FaFB_v1.0.repeats.fasta.gz


Gene Predictions

The Fragaria x ananassa Florida Brilliance v1.0 genome gene prediction files are available in GFF3 and FASTA formats.


Genes (GFF3 file) FaFB_v1.0.genes.gff3.gz
Gene sequences (FASTA file) FaFB_v1.0.gene.fasta.gz
mRNA sequences (FASTA file) FaFB_v1.0.mRNA.fasta.gz
CDS sequences (FASTA file) FaFB_v1.0.cds.fasta.gz
Protein sequences (FASTA file) FaFB_v1.0.protein.fasta.gz