Prunus avium Whole Genome Assembly v1.0 & Annotation v1 (v1.0.a1)

MethodSOAPdenovo (2)
SourceIllumina paired-end reads
Date performed2017-07-14

About the Assembly

The sweet cherry (Prunus avium) genome sequences were determined using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. 43,349 complete and partial protein-encoding genes were predicted. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site–associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers.


Estimated genome size (bp)  352,883,670 
# of scaffolds  10,148 
Size of scaffolds (bp)  272,361,615 
Scaffold N50 (bp)  219,566 
Longest scaffold (bp)  1,460,269 
GC (%)  37.7 
# of genes  43,673 
Mean size of genes (bp)  1,097 
Repeat (%)  43.8 



Shirasawa K, Isuzugawa K, Ikenaga M, Saito Y, Yamamoto T, Hirakawa H, Isobe S (2017)
The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.
DNA Res, doi:10.1093/dnares/dsx020


The Prunus avium v1.0.a1 genome assembly files are available in FASTA and GFF3 formats. There are a total of 9 pseudomolecules and 10,148 scaffolds in this assembly.


Pseudomolecule (FASTA file) Prunus_avium_v1.0.a1_pseudomolecule.fasta.gz
Scaffolds (FASTA file) Prunus avium_v1.0.a1_scaffolds.fasta.gz
Scaffolds (BED file) Prunus avium_v1.0.a1_scaffolds.bed.gz


Gene Predictions

The Prunus avium v1.0.a1 genome gene prediction files are available in FASTA and GFF3 formats.


Transcript CDS sequences (FASTA file) Prunus_avium_v1.0.a1_cds.fasta.gz
Protein sequences  (FASTA file) Prunus_avium_v1.0.a1_protein.fasta.gz
Genes aligned to pseudomolecule (GFF3 file) Prunus_avium_v1.0.a1_pseudomolecule.genes.gff.gz
Genes aligned to scaffold (GFF3 file) Prunus_avium_v1.0.a1_scaffold.genes.gff.gz



Homology of the Prunus avium v1.0.a1 transcript swas determined by pairwise sequence comparison using the blastx algorithm against various protein databases. The results are available for download in Excel format. An expectation value cutoff less than 1e-6 was used for Arabidoposis proteins and 1e-9  for the NCBI nr, Uniprot SwissProt, and Uniprot TrEMBL databases.


Protein Homologs

Sweet cherry transcripts with NCBI nr homologs Prunus_avium_v1.0.a1_vs_nr.xlsx
Sweet cherry transcripts with NCBI arabidopsis homologs Prunus_avium_v1.0.a1_vs_arabidopsis.xlsx
Sweet cherry transcripts with NCBI swissprot homologs Prunus_avium_v1.0.a1_vs_swissprot.xlsx
Sweet cherry transcripts with NCBI trembl homologs Prunus_avium_v1.0.a1_vs_trembl.xlsx


Functional Analysis

Functional annotation for the Prunus avium v1.0.a1 genome are available for download below. The Prunus avium transcripts were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


Gene functions annotated by InterProScan Prunus_avium_v1.0.a1_functions.txt.gz
GO assignments from InterProScan Prunus_avium_v1.0.a1_genes2GO.txt.gz
IPR assignments from InterProScan Prunus_avium_v1.0.a1_genes2IPR.txt.gz
KEGG Hierarchy file (for viewing with KegHeir) Prunus_avium_v1.0.a1_KEGG-hier.tar.gz
Transcripts mapped to KEGG Orthologs Prunus_avium_v1.0.a1_KEGG-orthologis.txt.gz
Transcripts mapped to KEGG Pathways Prunus_avium_v1.0.a1_KEGG-pathways.txt.gz



All assembly and annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar.  Each data type page will provide a description of the available files and links to download.  Alternatively, you can browse all available files on the FTP repository.


The Prunus avium v1.0.a1 genome repeat files are available in GFF3 formats. 


Repeats on pseudomolecules Prunus_avium_v1.0.a1_pseudomolecule.repeats.gff.gz
Repeats on scaffolds Prunus_avium_v1.0.a1_scaffold.repeats.gff.gz

The Prunus avium v1.0.a1 genome markers and DNA polymorphisms are downloadable as BED and VCF files (provided by Shirasawa et. al.)


CNVs Prunus_avium_v1.0.a1_pseudomolecule.CNVs.bed.gz
SNPInDel Prunus_avium_v1.0.a1_pseudomolecule.SNPindels.vcf.gz
SSRs Prunus_avium_v1.0.a1_pseudomolecule.SSRs.bed.gz