Fragaria x ananassa 'Florida Brilliance' Genome v1.0 Assembly & Annotation

Overview
Analysis NameFragaria x ananassa 'Florida Brilliance' Genome v1.0 Assembly & Annotation
MethodHifiasm (0.16.1)
SourceFaFB1 HiFi and Hi-C reads
Date performed2022-09-19

Publication

Han, H., Salinas, N., Barbey, C. R., Jang, Y. J., Fan, Z., Verma, S., Whitaker, V. M., & Lee, S. (2024). A telomere-to-telomere phased genome of an octoploid strawberry reveals a receptor kinase conferring anthracnose resistance. GigaScience. Manuscript submitted for publication.

Background
Trio binning is the first algorithm to generate a haplotype-resolved assembly, however, the requirement of parental data often limits the haplotype-phased genome assembly in practice. Recently, a new algorithm combining PacBio HiFi reads and Hi-C chromatin interaction data generated fully haplotype-phased genome assembly without parental data.
Without parental sequencing, we present first telomere-to-telomere octoploid strawberry genome assembly consisting of two haploid assemblies (phased-1 and phased-2).

Genome facts and statistics
The phase-1 assembly contained 3,716 contigs with an N50 of 23.7 Mb, and the phased-2 assembly contained 1,226 contigs with an N50 of 26.7 Mb. Only fifteen contigs accounted for 50% of phased-2 assembly, indicating that a contigs corresponds to a chromosome. In addition, largest contig size in phased-1 and phased-2 genome assemblies were over 36 Mb. Before scaffolding, the Benchmarking Universal Single-Copy Orthologs (BUSCO) scores were 99.2% in phased-1 assembly and 99.1% in phased-2 assembly indicating qualified initial assembly. Comparison of the full assembly to whole genome sequencing HiFi reads of ‘Florida Brilliance’ using Merqury showed very high base accuracy (QV>69.8), indicating 99.99999% of HiFi reads were detected on the combined phased-1 and 2 contigs.
We observed 99.1% complete gene models with a majority (96.6%) of the duplicated complete gene models in both phased-1 and phased-2 genome assembly. The final assembly of ‘Florida Brilliance’ consisted of 784.9 Mb and 781.0 Mb in phased-1 and phased-2 assembly. All 56 pseudo-chromosomes from phased-1 and phased-2 assembly contained putative telomere sequences at the 5’ and/or 3’ ends.

 

Assembly

The Fragaria x ananassa Florida Brilliance v1.0 assembly files files are available in GFF3 and FASTA formats. From each of the 28 parental pairs of octoploid strawberry chromosomes we selected the most contiguous pseudomolecule from the corresponding phase-1/phase-2 parents sets to produce an optimal haploid genome assembly labelled ‘FaFB1’.

Downloads

Chromosomes (FaFB1 non-masked)(FASTA file) FaFB1_v1.0.fasta.gz
Chromosomes (FaFB1 hard-masked)(FASTA file) FaFB1_masked_v1.0.fasta.gz
Chromosomes (FaFB1 soft-masked)(FASTA file) FaFB1_soft_masked_v1.0.fasta.gz
Chromosomes (Phase-1 non-masked)(FASTA file) Phase-1_v1.0.fasta.gz
Chromosomes (Phase-1 hard-masked)(FASTA file) Phase-1_masked_v1.0.fasta.gz
Chromosomes (Phase-1 soft-masked)(FASTA file) Phase-1_soft_masked_v1.0.fasta.gz
Chromosomes (Phase-2 non-masked)(FASTA file) Phase-2_v1.0.fasta.gz
Chromosomes (Phase-2 hard-masked)(FASTA file) Phase-2_masked_v1.0.fasta.gz
Chromosomes (Phase-2 soft-masked)(FASTA file) Phase-2_soft_masked_v1.0.fasta.gz
Gene Predictions

The Fragaria x ananassa Florida Brilliance v1.0 genome gene prediction files are available in GFF3 and FASTA formats.

Downloads

Genes (FaFB1 GFF3 file) FaFB1_v1.0.genes.gff3.gz
Gene sequences (FaFB1 FASTA file) FaFB1_v1.0.gene.fasta.gz
mRNA sequences (FaFB1 FASTA file) FaFB1_v1.0.mRNA.fasta.gz
Protein sequences (FaFB1 FASTA file) FaFB1_v1.0.protein.fasta.gz
Genes (Phase-1 GFF3 file) Phase-1_v1.0.genes.gff3.gz
Gene sequences (Phase-1 FASTA file) Phase-1_v1.0.gene.fasta.gz
mRNA sequences (Phase-1 FASTA file) Phase-1_v1.0.mRNA.fasta.gz
Protein sequences (Phase-1 FASTA file) Phase-1_v1.0.protein.fasta.gz
Genes (Phase-2 GFF3 file) Phase-2_v1.0.genes.gff3.gz
Gene sequences (Phase-2 FASTA file) Phase-2_v1.0.gene.fasta.gz
mRNA sequences (Phase-2 FASTA file) Phase-2_v1.0.mRNA.fasta.gz
Protein sequences (Phase-2 FASTA file) Phase-2_v1.0.protein.fasta.gz