Prunus armeniaca Genome v1.0 Assembly & Annotation

MethodPacBio, Canu (na)
SourcePrunus armeniaca Genome v1.0 (apricot)
Date performed2019-11-20


Jiang F, Zhang J, Wang S, Yang L, Luo Y, Gao S, Zhang M, Wu S, Hu S, Sun H, Wang Y. The apricot (Prunus armeniaca L.) genome elucidates Rosaceae evolution and beta-carotenoid synthesis. Horticulture research. 2019; 6:128.


Apricots, scientifically known as Prunus armeniaca L, are drupes that resemble and are closely related to peaches or plums. As one of the top consumed fruits, apricots are widely grown worldwide except in Antarctica. A high-quality reference genome for apricot is still unavailable, which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background, evolutionary diversity, and population diversity in apricot. DNA from P. armeniaca was used to generate a standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on Sequel SMRT Cells, generating a total of 16.54 Gb of PacBio subreads (N50 = 13.55 kb). The high-quality P. armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70× coverage and 171× Illumina reads (40.46 Gb), combined with a genetic map for chromosome scaffolding. The assembled genome size was 221.9 Mb, with a contig NG50 size of 1.02 Mb. Scaffolds covering 92.88% of the assembled genome were anchored on eight chromosomes. Benchmarking Universal Single-Copy Orthologs analysis showed 98.0% complete genes. We predicted 30,436 protein-coding genes, and 38.28% of the genome was predicted to be repetitive. We found 981 contracted gene families, 1324 expanded gene families and 2300 apricot-specific genes. The differentially expressed gene (DEG) analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase (NCED) gene but not lycopene beta-cyclase (LcyB) gene results in a low β-carotenoid content in the white cultivar "Dabaixing". This complete and highly contiguous P. armeniaca reference genome will be of help for future studies of resistance to plum pox virus (PPV) and the identification and characterization of important agronomic genes and breeding strategies in apricot.


Homology of the Prunus armeniaca genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 


Protein Homologs

Prunus armeniaca v1.0 proteins with NCBI nr homologs (EXCEL file) parmeniaca-v1.0_vs_nr.xlsx.gz
Prunus armeniaca v1.0 proteins with NCBI nr (FASTA file) parmeniaca-v1.0_vs_nr_hit.fasta.gz
Prunus armeniaca v1.0 proteins without NCBI nr (FASTA file) parmeniaca-v1.0_vs_nr_noHit.fasta.gz
Prunus armeniaca v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) parmeniaca-v1.0_vs_arabidopsis.xlsx.gz
Prunus armeniaca v1.0 proteins with arabidopsis (Araport11) (FASTA file) parmeniaca-v1.0_vs_arabidopsis_hit.fasta.gz
Prunus armeniaca v1.0 proteins without arabidopsis (Araport11) (FASTA file) parmeniaca-v1.0_vs_arabidopsis_noHit.fasta.gz
Prunus armeniaca v1.0 proteins with SwissProt homologs (EXCEL file) parmeniaca-v1.0_vs_swissprot.xlsx.gz
Prunus armeniaca v1.0 proteins with SwissProt (FASTA file) parmeniaca-v1.0_vs_swissprot_hit.fasta.gz
Prunus armeniaca v1.0 proteins without SwissProt (FASTA file) parmeniaca-v1.0_vs_swissprot_noHit.fasta.gz
Prunus armeniaca v1.0 proteins with TrEMBL homologs (EXCEL file) parmeniaca-v1.0_vs_trembl.xlsx.gz
Prunus armeniaca v1.0 proteins with TrEMBL (FASTA file) parmeniaca-v1.0_vs_trembl_hit.fasta.gz
Prunus armeniaca v1.0 proteins without TrEMBL (FASTA file) parmeniaca-v1.0_vs_trembl_noHit.fasta.gz



All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.


The Prunus armeniaca L.( Genome v1.0 assembly file is available in FASTA format.


Chromosomes (FASTA file) parmeniaca-v1.0.fasta.gz
Chromosomes (masked) (FASTA file) parmeniaca-v1.0.masked.fasta.gz


Gene Predictions

The Prunus armeniaca L. v1.0 genome gene prediction files are available in FASTA and GFF3 formats.


Protein sequences  (FASTA file) parmeniaca-v1.0.proteins.fasta.gz
CDS (FASTA file) parmeniaca-v1.0.CDs.fasta.gz
Genes (GFF3 file) parmeniaca-v1.0.genes.gff3.gz


Functional Analysis

Functional annotation for the Prunus armeniaca genome v1.0 are available for download below. The Prunus armeniaca genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan parmeniaca-v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan parmeniaca-v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Pathways parmeniaca-v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Orthologs parmeniaca-v1.0_KEGG-pathways.xlsx.gz


Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Prunus armeniaca genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.


Fragaria x ananassa GDR RefTrans v1 Prunus armeniaca_v1.0_f.x.ananassa_GDR_reftransV1
Malus_x_domestica GDR RefTrans v1 Prunus armeniaca_v1.0_m.x.domestica_GDR_reftransV1
Prunus avium GDR RefTrans v1 Prunus armeniaca_v1.0_p.avium_GDR_reftransV1
Prunus persica GDR RefTrans v1 Prunus armeniaca_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Prunus armeniaca_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Prunus armeniaca_v1.0_rubus_GDR_reftransV2