Malus x domestica 'Hanfu' Whole Genome v1.0 Assembly & Annotation

Analysis NameMalus x domestica 'Hanfu' Whole Genome v1.0 Assembly & Annotation
MethodFALCON (v1.0)
SourceIllumina HiSeq, PacBio and Hi-C reads for Hanfu
Date performed2023-06-23


Qin, S., Xu, G., He, J., Li, L., Ma, H., & Lyu, D. (2023). A chromosome-scale genome assembly of Malus domestica, a multi-stress resistant apple variety. Genomics, 115(3), 110627.


Hanfu apple is the main cultivar grown in the cool areas of Northeast, Northwest, and North China. Here, we proposed a chromosome-level Hanfu genome assembly using PacBio, Illumina and Hi-C sequencing data. The total contig length was 628.99 Mb, with scaffold and contig N50 sizes of 36.18 Mb and 1.25 Mb, respectively. The Hanfu genome had a total of 39,617 genes, of which we predicted the function for 38,816. Evolutionary analysis showed that Hanfu may have undergone a γ-event, a recent whole-genome duplication. A comparative analysis was conducted on the genomes of Hanfu and homozygous triploid HFTH1, which were cultured using the anthers of diploid Hanfu apples. Three variants were identified, including 2,155,184 single nucleotide polymorphisms (SNPs), 413,108 insertions/deletions (indels), and 7,587 structural variants (SVs).This high-quality genome will provide a reference for the genetic improvement of apples and the breeding of more varieties with high resistance and high quality.

Table 1. Assembly and annotation statistics of the Hanfu genome.
Items Value
Total contig length (Mb) 628.99
Number of contigs 978
Contig N50 (kb) 1247.14
Maximum contig length (kb) 9303.7
Total genome length (Mb) 631.76
Number of scaffolds 140
Scaffold N50 (kb) 36,183.2
Maximum scaffold length (kb) 51,004.6
GC content (%) 38
Annotated protein-coding genes 39,617


Table S10 BUSCO notation assessment of Hanfu genome

Species BUSCO notation assessment results
Apple C:94.1%[S:64.0%,D:30.1%],F:1.0%,M:4.9%,n:1440



Homology of the Malus x domestica Hanfu Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

Malus x domestica v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Mdomestica_Hanfu_v1.0_vs_arabidopsis.xlsx.gz
Malus x domestica v1.0 proteins with arabidopsis (Araport11) (FASTA file) Mdomestica_Hanfu_v1.0_vs_arabidopsis_hit.fasta.gz
Malus x domestica v1.0 proteins without arabidopsis (Araport11) (FASTA file) Mdomestica_Hanfu_v1.0_vs_arabidopsis_noHit.fasta.gz
Malus x domestica v1.0 proteins with SwissProt homologs (EXCEL file) Mdomestica_Hanfu_v1.0_vs_swissprot.xlsx.gz
Malus x domestica v1.0 proteins with SwissProt (FASTA file) Mdomestica_Hanfu_v1.0_vs_swissprot_hit.fasta.gz
Malus x domestica v1.0 proteins without SwissProt (FASTA file) Mdomestica_Hanfu_v1.0_vs_swissprot_noHit.fasta.gz
Malus x domestica v1.0 proteins with TrEMBL homologs (EXCEL file) Mdomestica_Hanfu_v1.0_vs_trembl.xlsx.gz
Malus x domestica v1.0 proteins with TrEMBL (FASTA file) Mdomestica_Hanfu_v1.0_vs_trembl_hit.fasta.gz
Malus x domestica v1.0 proteins without TrEMBL (FASTA file) Mdomestica_Hanfu_v1.0_vs_trembl_noHit.fasta.gz

The Malus x domestica Hanfu Genome v1.0 assembly files are available in FASTA format.


Chromosomes and scaffolds (FASTA file) Malus_x_domestica_Hanfu_V1.0.a1.fasta.gz
Gene Predictions

The Malus x domestica Hanfu v1.0 genome gene prediction files are available in GFF3 and FASTA format.


Genes (GFF3 file) Malus_x_domestica_Hanfu_V1.0.a1.genes.gff3.gz
Protein sequences (FASTA file) Malus_x_domestica_Hanfu_V1.0.a1.pep.fasta.gz
CDS sequences (FASTA file) Malus_x_domestica_Hanfu_V1.0.a1.cds.fasta.gz
Functional Analysis

Functional annotation for the Malus x domestica Hanfu Genome v1.0 are available for download below. The Malus x domestica Hanfu Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan Mdomestica_Hanfu_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Mdomestica_Hanfu_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Mdomestica_Hanfu_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Mdomestica_Hanfu_v1.0_KEGG-pathways.xlsx.gz
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Malus x domesticaHanfu genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.


Malus x domestica GDR RefTrans v1 Mdomestica_Hanfu_v1.0_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 Mdomestica_Hanfu_v1.0_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 Mdomestica_Hanfu_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Mdomestica_Hanfu_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Mdomestica_Hanfu_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Mdomestica_Hanfu_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Mdomestica_Hanfu_v1.0_pyrus_GDR_reftransV1