|
Overview
Analysis Name | Fragaria x ananassa 'Florida Brilliance' Genome v1.0 Assembly & Annotation |
Method | Hifiasm (0.16.1) |
Source | FaFB1 HiFi and Hi-C reads |
Date performed | 2022-09-19 |
Publication
Han, H., Salinas, N., Barbey, C. R., Jang, Y. J., Fan, Z., Verma, S., Whitaker, V. M., & Lee, S. (2024). A telomere-to-telomere phased genome of an octoploid strawberry reveals a receptor kinase conferring anthracnose resistance. GigaScience. Manuscript submitted for publication.
Background
Trio binning is the first algorithm to generate a haplotype-resolved assembly, however, the requirement of parental data often limits the haplotype-phased genome assembly in practice. Recently, a new algorithm combining PacBio HiFi reads and Hi-C chromatin interaction data generated fully haplotype-phased genome assembly without parental data.
Without parental sequencing, we present first telomere-to-telomere octoploid strawberry genome assembly consisting of two haploid assemblies (phased-1 and phased-2).
Genome facts and statistics
The phase-1 assembly contained 3,716 contigs with an N50 of 23.7 Mb, and the phased-2 assembly contained 1,226 contigs with an N50 of 26.7 Mb. Only fifteen contigs accounted for 50% of phased-2 assembly, indicating that a contigs corresponds to a chromosome. In addition, largest contig size in phased-1 and phased-2 genome assemblies were over 36 Mb. Before scaffolding, the Benchmarking Universal Single-Copy Orthologs (BUSCO) scores were 99.2% in phased-1 assembly and 99.1% in phased-2 assembly indicating qualified initial assembly. Comparison of the full assembly to whole genome sequencing HiFi reads of ‘Florida Brilliance’ using Merqury showed very high base accuracy (QV>69.8), indicating 99.99999% of HiFi reads were detected on the combined phased-1 and 2 contigs.
We observed 99.1% complete gene models with a majority (96.6%) of the duplicated complete gene models in both phased-1 and phased-2 genome assembly. The final assembly of ‘Florida Brilliance’ consisted of 784.9 Mb and 781.0 Mb in phased-1 and phased-2 assembly. All 56 pseudo-chromosomes from phased-1 and phased-2 assembly contained putative telomere sequences at the 5’ and/or 3’ ends.
|