Malus baccata Genome v1.0 Assembly & Annotation

Analysis NameMalus baccata Genome v1.0 Assembly & Annotation
MethodSOAPdenovo (2.04.4)
SourceMalus baccata Genome v1.0 Assembly & Annotation
Date performed2019-08-28

About the Assembly

Malus baccata is one of four wild apple species that can hybridize with the cultivated apple species (Malus domestica). It is widely used in high-latitude apple-producing areas as a rootstock and breeding resource because of its disease resistance, and cold tolerance. A lack of a reference genome has limited the application of M. baccata for apple breeding. We present a draft reference genome for M. baccata. The assembled sequence consisting of 665 Mb, with a scaffold N50 value of 452 kb, included transposable elements (413 Mb) and 46,114 high-quality protein-coding genes. According to a genetic map derived from 390 sibling lines, 72% of the assembly and 85% of the putative genes were anchored to 17 linkage groups. Many of the M. baccata genes under positive selection pressure were associated with plant–pathogen interaction pathways. We identified 2,345 Transcription factor-encoding genes in 58 families in the M. baccata genome. Genes related to disease defense and cold tolerance were also identified. A total of 462 putative nucleotide-binding site (NBS)-leucine-rich-repeat (LRR) genes, 177 Receptor-like kinase (RLK) and 51 receptor-like proteins (RLP) genes were identified in this genome assembly. The M. baccata genome contained 3978 cold-regulated genes, and 50% of these gene promoter containing DREB motif which can be induced by CBF gene. We herein present the first M. baccata genome assembly, which may be useful for exploring genetic variations in diverse apple germplasm, and for facilitating marker-assisted breeding of new apple cultivars exhibiting resistance to disease and cold stress.


Chen X, Li S, Zhang D, Han M, Jin X, Zhao C, Wang S, Xing L, Ma J, Ji J, An N. Sequencing of a Wild Apple (Malus baccata) Genome Unravels the Differences Between Cultivated and Wild Apple Species Regarding Disease Resistance and Cold Tolerance. G3: GENES, GENOMES, GENETICS July 1, 2019 vol. 9 no. 7 2051-2060; Journal | GDR


Homology of the Malus baccata genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (TAIR10), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 


Protein Homologs

Malus baccata v1.0 proteins with NCBI nr homologs (EXCEL file) mbaccata-v1.0_vs_nr.xlsx.gz
Malus baccata v1.0 proteins with NCBI nr (FASTA file) mbaccata-v1.0_vs_nr_hit.fasta.gz
Malus baccata v1.0 proteins without NCBI nr (FASTA file) mbaccata-v1.0_vs_nr_noHit.fasta.gz
Malus baccata v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) mbaccata-v1.0_vs_arabidopsis.xlsx.gz
Malus baccata v1.0 proteins with arabidopsis (Araport11) (FASTA file) mbaccata-v1.0_vs_arabidopsis_hit.fasta.gz
Malus baccata v1.0 proteins without arabidopsis (Araport11) (FASTA file) mbaccata-v1.0_vs_arabidopsis_noHit.fasta.gz
Malus baccata v1.0 proteins with SwissProt homologs (EXCEL file) mbaccata-v1.0_vs_swissprot.xlsx.gz
Malus baccata v1.0 proteins with SwissProt (FASTA file) mbaccata-v1.0_vs_swissprot_hit.fasta.gz
Malus baccata v1.0 proteins without SwissProt (FASTA file) mbaccata-v1.0_vs_swissprot_noHit.fasta.gz
Malus baccata v1.0 proteins with TrEMBL homologs (EXCEL file) mbaccata-v1.0_vs_trembl.xlsx.gz
Malus baccata v1.0 proteins with TrEMBL (FASTA file) mbaccata-v1.0_vs_trembl_hit.fasta.gz
Malus baccata v1.0 proteins without TrEMBL (FASTA file) mbaccata-v1.0_vs_trembl_noHit.fasta.gz



All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar.  Each data type page will provide a description of the available files and links do download.


The Malus baccata Genome v1.0 assembly file is available in FASTA format.


Scaffold (FASTA file) mbaccata-v1.0.fasta.gz
Gene Predictions

The Malus baccata v1.0 genome gene prediction files are available in FASTA and GFF3 formats.


Protein sequences  (FASTA file) mbaccata-v1.0.proteins.fasta.gz
CDS (FASTA file) mbaccata-v1.0.CDs.fasta.gz
Genes (GFF3 file) mbaccata-v1.0.genes.gff3.gz


Functional Analysis

Functional annotation for the Malus baccata genome v1.0 are available for download below. The Malus baccata genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan mbaccata-v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan mbaccata-v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs mbaccata-v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways mbaccata-v1.0_KEGG-pathways.xlsx.gz