Cerasus speciosa IZO01 Genome v1.0 Assembly and Annotation

Overview
Analysis NameCerasus speciosa IZO01 Genome v1.0 Assembly and Annotation
MethodHifiasm (v0.19.5-r587)
SourcePacBio HiFi reads and Omni-C reads for Cerasus speciosa
Date performed2024-05-28

Publication: Fujiwara, K., Toyoda, A., Biswa, B. B., Kishida, T., Tsuruta, M., Nakamura, Y., Kimura, N., Kawamoto, S., Sato, Y., Katsuki, T., Sakura 100 Genome Consortium, & Koide, T. (2025). A Near Complete Genome Assembly of the Oshima Cherry Cerasus speciosa. Scientific Data volume 12, Article number: 162 (2025). https://www.nature.com/articles/s41597-025-04388-z

Abstract:

The Oshima cherry (Cerasus speciosa), which is endemic to Japan, has significant cultural and horticultural value. In this study, we present a near complete telomere-to-telomere genome assembly for C. speciosa, derived from the old growth “Sakurakkabu” tree on Izu Oshima Island. Using Illumina short-read, PacBio long-read, and Hi-C sequencing, we constructed a 269.3 Mbp genome assembly with a contig N50 of 32.0 Mbp. We examined the distribution of repetitive sequences in the assembled genome and identified regions that appeared to be centromeric. Detailed structural analysis of these putative centromeric regions revealed that the centromeric regions of C. speciosa comprised repetitive sequences with monomer lengths of 166 or 167 bp. Comparative genomic analysis with Prunus sensu lato genome revealed structural variations and conserved syntenic regions. This high-quality reference genome provides a crucial tool for studying the genetic diversity and evolutionary history of Cerasus species, facilitating advancements in horticultural research and the preservation of this iconic species.

Genome assembly and annotation statistics for C. speciosa.

Genome assembly statistics

Value

BUSCO analysis

 lineage dataset

eudicots_odb10

 Completeness

98.4% (2289/2326)

 Complete single-copy

95.9% (2231/2326)

 Complete duplicated

2.5% (58/2326)

 Fragmented

0.5% (11/2326)

 Missing

1.1% (26/2326)

Compleasm analysis

 lineage dataset

eudicots_odb10

 Completeness

99.1% (2306/2326)

 Complete single-copy

97.0% (2257/2326)

 Complete duplicated

2.1% (49/2326)

 Fragmented

0.4% (9/2326)

 Missing

0.5% (11/2326)

Inspector analysis

 reads-to-contigs mapping rate

99.99%

 QV

35.89 (99.97% accuracy)

Merqury analysis

 QV

QV > 67

 error rate

1.80 × 10−7

 

Homology

Homology of the Cerasus speciosa IZO01 genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2024-03), and UniProtKB/TrEMBL (Release 2024-03) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

C. speciosa IZO01 v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Cspeciosa_IZO01_v1.0_vs_arabidopsis.xlsx.gz
C. speciosa IZO01 v1.0 proteins with arabidopsis (Araport11) (FASTA file) Cspeciosa_IZO01_v1.0_vs_arabidopsis_hit.fasta.gz
C. speciosa IZO01 v1.0 proteins without arabidopsis (Araport11) (FASTA file) Cspeciosa_IZO01_v1.0_vs_arabidopsis_noHit.fasta.gz
C. speciosa IZO01 v1.0 proteins with SwissProt homologs (EXCEL file) Cspeciosa_IZO01_v1.0_vs_swissprot.xlsx.gz
C. speciosa IZO01 v1.0 proteins with SwissProt (FASTA file) Cspeciosa_IZO01_v1.0_vs_swissprot_hit.fasta.gz
C. speciosa IZO01 v1.0 proteins without SwissProt (FASTA file) Cspeciosa_IZO01_v1.0_vs_swissprot_noHit.fasta.gz
C. speciosa IZO01 v1.0 proteins with TrEMBL homologs (EXCEL file) Cspeciosa_IZO01_v1.0_vs_trembl.xlsx.gz
C. speciosa IZO01 v1.0 proteins with TrEMBL (FASTA file) Cspeciosa_IZO01_v1.0_vs_trembl_hit.fasta.gz
C. speciosa IZO01 v1.0 proteins without TrEMBL (FASTA file) Cspeciosa_IZO01_v1.0_vs_trembl_noHit.fasta.gz
Assembly

The Cerasus speciosa IZO01 Genome v1.0 assembly files are available in FASTA format.

Downloads

Chromosomes (FASTA file) CerSpe_IZO01_v1.0.a1.fasta.gz
Repeats (FASTA file) CerSpe_IZO01_v1.0.a1.repeat.fa.gz
Repeats (GFF3 file) CerSpe_IZO01_v1.0.a1.repeat.gff3.gz
Gene Predictions

The Cerasus speciosa IZO01 v1.0.a1 genome gene prediction file are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) CerSpe_IZO01_v1.0.a1.genes.gff3.gz
Protein sequences (FASTA file) CerSpe_IZO01_v1.0.a1.protein.fa.gz
Transcript sequences (FASTA file) CerSpe_IZO01_v1.0.a1.transcript.fa.gz
CDS sequences (FASTA file) CerSpe_IZO01_v1.0.a1.cds.fa.gz
ncRNA (GFF file) CerSpe_IZO01_v1.0.a1.ncRNA.gff.gz
Functional Analysis

Functional annotation for the Cerasus speciosa IZO01 genome v1.0 are available for download below. The C. speciosa IZO01 genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan Cspeciosa_IZO01_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Cspeciosa_IZO01_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Cspeciosa_IZO01_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Cspeciosa_IZO01_v1.0_KEGG-pathways.xlsx.gz
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Cerasus speciosa IZO01 genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3.

 

Fragaria x ananassa GDR RefTrans v1 Cspeciosa_IZO01_v1.0_f.x.ananassa_GDR_reftransV1
Prunus avium GDR RefTrans v1 Cspeciosa_IZO01_v1.0_p.avium_GDR_reftransV1
Prunus persica GDR RefTrans v1 Cspeciosa_IZO01_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Cspeciosa_IZO01_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Cspeciosa_IZO01_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Cspeciosa_IZO01_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Cspeciosa_IZO01_v1.0_pyrus_GDR_reftransV1