Prunus mongolica Whole Genome v1.0 Assembly & Annotation

MethodHifiasm13 (0.16)
SourceIllumina reads and PacBio HiFi reads for Prunus mongolica v1.0
Date performed2023-06-27


Chromosome-level genome assembly of an endangered plant Prunus mongolica using PacBio and Hi-C technologies 
Qiang Zhu, Yali Wang, Ning Yao, Xilu Ni, Cuiping Wang, Meng Wang, Lei Zhang, Wenyu Liang . DNA Research, Volume 30, Issue 4, August 2023, dsad012,


Prunus mongolica is an ecologically and economically important xerophytic tree native to Northwest China. Here, we report a high-quality, chromosome-level P. mongolica genome assembly integrating PacBio high-fidelity sequencing and Hi-C technology. The assembled genome was 233.17 Mb in size, with 98.89% assigned to eight pseudochromosomes. The genome had contig and scaffold N50s of 24.33 Mb and 26.54 Mb, respectively, a BUSCO completeness score of 98.76%, and CEGMA indicated that 98.47% of the assembled genome was reliably annotated. The genome contained a total of 88.54 Mb (37.97%) of repetitive sequences and 23,798 protein-coding genes. We found that P. mongolica experienced two whole-genome duplications, with the most recent event occurring ~3.57 million years ago. Phylogenetic and chromosome syntenic analyses revealed that P. mongolica was closely related to P. persica and P. dulcis. Furthermore, we identified a number of candidate genes involved in drought tolerance and fatty acid biosynthesis. These candidate genes are likely to prove useful in studies of drought tolerance and fatty acid biosynthesis in P. mongolica, and will provide important genetic resources for molecular breeding and improvement experiments in Prunus species. This high-quality reference genome will also accelerate the study of the adaptation of xerophytic plants to drought.

Table 1. Global statistics of Prunus mongolica genome assembly and annotation

Parameter Size or number
Estimate of genome size (survey), Mb 226,470,058
Assembled genome size, bp 233,169,053
Total length of contigs, bp 233,168,353
Total number of contigs 91
N50 of contigs, bp 24,328,480
Largest contig, bp 30,055,353
Total length of scaffolds, bp 233,169,053
Total number of scaffolds 84
N50 of scaffolds, bp 26,540,977
Largest scaffold, bp 47,189,607
GC content, % 38.20
Complete CEGMA, % 98.47
Complete BUSCOs, % 98.76
Total length of repeat, bp 88,539,904
Repeat density, % 37.97
Long terminal repeat (LTR) density, % 17.92
Microsatellite repeat density, % 0.94
Number of protein-coding genes 23,798
Number of annotated genes 23,702
Number of rRNA 832
Number of tRNA 625
Number of miRNAs 217
Number of snRNAs 283
Number of snoRNAs 491
Number of pseudogenes 161

Homology of the Prunus mongolica Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

Prunus mongolica v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Prunus_mongolica_v1.0_vs_arabidopsis.xlsx.gz
Prunus mongolica v1.0 proteins with arabidopsis (Araport11) (FASTA file) Pmongolica_v1.0_vs_arabidopsis_hit.fasta.gz
Prunus mongolica v1.0 proteins without arabidopsis (Araport11) (FASTA file) Pmongolica_v1.0_vs_arabidopsis_noHit.fasta.gz
Prunus mongolica v1.0 proteins with SwissProt homologs (EXCEL file) Pmongolica_v1.0_vs_swissprot.xlsx.gz
Prunus mongolica v1.0 proteins with SwissProt (FASTA file) Pmongolica_v1.0_vs_swissprot_hit.fasta.gz
Prunus mongolica v1.0 proteins without SwissProt (FASTA file) Pmongolica_v1.0_vs_swissprot_noHit.fasta.gz
Prunus mongolica v1.0 proteins with TrEMBL homologs (EXCEL file) Pmongolica_v1.0_vs_trembl.xlsx.gz
Prunus mongolica v1.0 proteins with TrEMBL (FASTA file) Pmongolica_v1.0_vs_trembl_hit.fasta.gz
Prunus mongolica v1.0 proteins without TrEMBL (FASTA file) Pmongolica_v1.0_vs_trembl_noHit.fasta.gz



The Prunus mongolica Genome v1.0 assembly files are available in FASTA format.


Chromosomes (FASTA file) Prunus_mongolica_V1.0.a1.fasta.gz


Gene Predictions

The Prunus mongolica v1.0 genome gene prediction files are available in GFF3 and FASTA format.


Genes (GFF3 file) Prunus_mongolica_V1.0.a1.genes.gff3.gz
Protein sequences (FASTA file) Prunus_mongolica_V1.0.a1.pep.fasta.gz
CDS sequences (FASTA file) Prunus_mongolica_V1.0.a1.cds.fasta.gz
Functional Analysis

Functional annotation for the Prunus mongolica Genome v1.0 are available for download below. The Prunus mongolica Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan Prunus_mongolica_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Prunus_mongolica_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Prunus_mongolica_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Prunus_mongolica_v1.0_KEGG-pathways.xlsx.gz
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Prunus mongolica genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.


Prunus mongolica GDR RefTrans v1 Prunus_mongolica_v1.0_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 Prunus_mongolica_v1.0_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 Prunus_mongolica_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Prunus_mongolica_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Prunus_mongolica_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Prunus_mongolica_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Prunus_mongolica_v1.0_pyrus_GDR_reftransV1