Prunus persica Whole Genome v1.0 Assembly & Annotation

Overview
Analysis NamePrunus persica Whole Genome v1.0 Assembly & Annotation
MethodArachne
SourcePrunus persica Sanger Reads
Date performed2012-01-16

For use in publications, please CITE the original paper in Nature Genetics:

The International Peach Genome Initiative (2013). The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45, 487-494 (2013) doi:10.1038/ng.2586
 

Background

Peach (Prunus persica) is considered one of the genetically most well characterized species in the Rosaceae, and it has distinct advantages that make it suitable as a model genome species for Prunus as well as for other species in the Rosaceae. While some Prunus species, such as cultivated plums and sour cherries, are polyploid, peach is a diploid with n = 8 and has a comparatively small genome currently estimated to be ~220-230 Mbp based upon the peach v1.0 assembly. Peach has a relatively short juvenility period of 2-3 years compared to most other fruit tree species that require 6-10 years. In addition, a number of genes for fundamentally important traits have been genetically described in peach, including genes controlling flower and fruit development, tree growth habit, dormancy, cold hardiness, and disease and pest resistance.

Genome facts and statistics

Peach v1.0 was generated from DNA from the doubled haploid cultivar ‘Lovell’ which means that the genes and intervening DNA is “fixed” or identical for all alleles and both chromosomal copies of the genome. This doubled haploid nature was confirmed by the evaluation of >200 SSRs, and has facilitated a highly accurate and consistent assembly of the peach genome.

Peach v1.0 currently consists of 8 pseudomolecules (scaffolds) representing the 8 chromosomes of peach, and are numbered according to their corresponding linkage groups. The genome sequencing consisted of approximately 7.7 fold whole genome shotgun sequencing employing the accurate Sanger methodology, and was assembled using Arachne. The assembled peach scaffolds cover nearly 99% of the peach genome, with over 92% having confirmed orientation. To further validate the quality of the assembly, 74,757 Prunus ESTs were queried against the genome at 90% identity and 85% coverage, and we found that only ~2% were missing. This is truly a high quality genome! Gene prediction and annotation, is an ongoing process that may take years to complete, but current estimates indicate that peach has 28,689 transcripts and 27,852 genes.

Links to the peach genome browsers housed at JGI, the Genome Database for Rosaceae (GDR), and the Italian version housed at the Istituto di Genomica Applicata (IGA), along with links to the raw data are provided below. Also provided are resources and links to help you navigate and utilize the peach genome to further your research.
 

Additionally, access to the peach genome is provided by GDRJGI Phytozome and IGA.

 

Homology

The Prunux persica v1.0 genome homology files are available for download in Excel formats with links to GBrowse and to external databases for matched homologs.  All homology data was determined using the predicted peach gene transcripts (28,692 sequences) and NCBI blastx against various protein databases.  An expectation value cutoff of 1e-6 was used. For EST alignments the NCBI Rosaceae and Genera EST databases were downloaded, and filtered for quality before blasting.  

Protein Homologs

predicted gene functions Prunus_persica_v1.0_gene_function.xls
24,423 peach gene transcripts with Arabidopsis homologs Prunus_persica_v1.0_vs_arabidopsis.xls
18,822 peach gene transcripts with Swiss-Prot homologs Prunus_persica_v1.0_vs_swissprot.xls
26,731 peach gene transcripts with TrEMBL homologs Prunus_persica_v1.0_vs_trembl.xls

Rosaceae EST alignments

Apple ESTs Prunus_persica_v1.0_vs_malus_ESTs.xls
Strawberry ESTs Prunus_persica_v1.0_vs_fragaria_ESTs.xls
Prunus ESTs Prunus_persica_v1.0_vs_prunus_ESTs.xls
Rosa ESTs Prunus_persica_v1.0_vs_rosa_ESTs.xls
Rubus ESTs Prunus_persica_v1.0_vs_rubus_ESTs.xls
Pyrus ESTs Prunus_persica_v1.0_vs_pyrus_ESTs.xls

 

Downloads
All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links do download.
Assembly

The Prunux persica v1.0 genome assembly files are available in FASTA and GFF3 formats.  There are a total of 202 scaffolds in this assembly of peach.  The psuedomolecules corresponding to the eight chromosomes of peach are the first eight scaffolds of the assembly.   In future releases these psuedomolecules will most likely be renamed but for now the pseudomolecules are named scaffold_1, scaffold_2, scaffold_3, etc. 

Downloads

Scaffolds (FASTA file)  Prunus_persica_v1.0_scaffolds.fa.gz
Scaffolds (GFF3 file)  Prunus_persica_v1.0_scaffolds.gff3.gz
Pseudomolecules (first 8 scaffolds) (FASTA file)  Prunus_persica_v1.0_chr.fa.gz
Scaffolds masked for repeats (hard masking with N's)  Prunus_persica_v1.0_scaffolds.hardmasked.fa.gz
Scaffolds masked for repeats (soft masking with lower-case letters)  Prunus_persica_v1.0_scaffolds.softmasked.fa.gz

 

Gene Predictions

The Prunus persica v1.0 genome gene prediction files are available in FASTA and GFF3 formats. An update on May 16, 2012 added Phytozome PACid's to the genes GFF3 file.

Downloads

CDS sequences (FASTA file) Prunus_persica_v1.0_CDS.fa.gz
Peptide sequences  (FASTA file) Prunus_persica_v1.0_peptide.fa.gz
mRNA (transcripts) (FASTA file) Prunus_persica_v1.0_transcript.fa.gz
Genes, CDS, 5' UTR, 3'UTR locations (GFF3 file) Prunus_persica_v1.0_genes.gff3.gz

 

Functional Analysis

The Prunus persica v1.0 genome genes were mapped to KEGG pathways and orthologs using the KEGG Automatic Annotation Server (KAAS).    Resulting files are available for download below.

KEGG

Mapping of KEGG orthologs to peach genes Prunus_perisca_v1.0_KEGG.orthologs.txt
KEGG heirarchy file for viewing in KegHeir tool Prunus_perisca_v1.0_KEGG.hier.tar.gz
KEGG pathway maps. Proteins mapped to peach genes are highlighted Prunus_perisca_v1.0_KEGG.map.tar.gz

 

Repeats

The Prunux persica v1.0 genome repeat files are available in GFF3 formats.  Repeats were predicted using the Repbase database, LTR Finder and ReAS prediction tools.  A consenus file contains repeats from all three methods.

Downloads

The consensus repeats from LTR Finder, Repbase and ReAS Prunus_persica_v1.0_repeats_consensus.gff3.gz
Repeats predicted using repeats from Rebpase Prunus_persica_v1.0_repeats_Repbase.gff3.gz
Repeat predictions from LTR Finder Prunus_persica_v1.0_repeats_LTR.gff3.gz
Repeat predictions from ReAS Prunus_persica_v1.0_repeats_ReAS.gff3.gz
Scaffolds masked for repeats (hard masking with N's) Prunus_persica_v1.0_scaffolds.hardmasked.fa.gz
Scaffolds masked for repeats (soft masking with lower-case letters) Prunus_persica_v1.0_scaffolds.softmasked.fa.gz

 

SNPs

SNPs have been provided by several different groups. 

  • The IRSC (International Rosaceae Sequencing Consortium) has mined SNPs for peach which are included on Illumina Infinium arrays.  These SNPs are provided in Excel format below.  Additionally, all available candidate SNPs are also available, as well as cherry SNPs that have been mapped to the peach genome.
     
  • SNPs used for two breeding populations  Pop DF ('Dr. Davis' × 'F8, 1-42') and Pop DG ('Dr. Davis' × 'Georgia Belle'). 

    Riaz Ahmad, Dan E Parfitt, Joseph Fass, Ebenezer Ogundiwin, Amit Dhingra, Thomas M Gradziel, Dawei Lin, Nikhil A Joshi, Pedro J Martinez-Garcia, and Carlos H Crisosto. Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection. BMC Genomics. 2011; 12:569
     

  • Using RNA-seq data from cherry cultivars SNPs were identifeid in 3'UTR regions of transcripts and then mapped to the peach genome. Those alignments (and all 3'UTR Cherry SNPs) are provided below in Excel and GFF formats

    T Koepke, S Schaeffer, V Krishnan, D Jiwan, A Harper, M Whiting, N Oraguzie and A Dhingra. 2012 Rapid gene-based SNP and haplotype marker development in non-model eukaryotes using 3′UTR sequencing. BMC Genomics 13: 18.          
     

Downloads

IRSC 9K peach SNPs IRSC_9K_peach_SNP_array.xls
IRSC 9K peach SNPs predicted primers IRSC_array_peach_snps.gff.primer3.output.xls
IRSC 6K cherry SNPs mapped to peach IRSC_6K_cherry_SNP_array.xls
IRSC 40K peach candidate SNPs peach_candidate_snps.xls
IRSC 40K peach candidate SNPs predicted primers peach_candidate_snps.gff.primer3.output.xls
UC Davis 6K SNPs ucd6k_peach_snps.xls
Cherry 3'UTR SNPs aligned to Peach (Excel) SweetCherrySNPs-3_UTR.xlsx
Cherry 3'UTR SNPs aligned to Peach (GFF3) SweetCherrySNPs-3_UTR.gff3.gz

 

RosBREED Resequencing Alignments

A total of 23 different peach accessions were resequenced using Illumina short-read technology.  The reads were trimmed and aligned to the Peach v1.0 genome and are available in BAM alignment files.

It is not necessary to download the BAM alignment files. Some are very large and multiple downloads may oversubscribe the network bandwidth. Rather, use the following instructions for viewing the alignments.

To view the strawberry resequencing alignments please follow these instructions:

  1. First, download the Prunus_persica_v1.0.zip file. This file contains the reference sequence and gene models. After downloading, unzip this file in your working directory.
  2. Launch the Integrative Genomics Viewer (IGV). Launch the version appropriate for the amount of memory you have available on your computer
  3. After IGV starts, load the genome file downloaded in the first step by clicking the menu item GenomesLoad Genome From File. Navigate to the folder where you unpacked the zip file from step 1 and select the file named Prunus_persica_v1.0.genome.
  4. Select an alignment file you wish to view by right-clicking on a file with a .bam extension and select the option Copy link location (in Chrome and Firefox), Copy shortcut (in Internet Explorer) or Copy link (in Safari).
  5. Add the alignment as a track in IGV by clicking the menu item FileLoad from URL. Paste the URL copied in the previous step into the box.
  6. You may load as many alignment files as you want
Tools

Prunus persica tools available on GDR

 Additional Prunus persica tools

 

 

 

 

 

  

 

 

Assembly Refinements

After the initial assembly of the peach v1.0 genome, some large scaffolds were missing markers that allowed for their proper orientation and placement within the pseudomolecules.  Further analysis was performed to locate markers on 10 scaffolds greater than 300kp in order to place them and orient them within the assembly. A refined assembly was then generated.  This refined assembly will be coming in a future release of the peach genome, and is reported in the upcoming peach genome publication.  For reference, the following JBrowse viewer is available to visualize changes to the assembly.

View assembly changes in JBrowse

 

 

Markers

The Prunux persica v1.0 genome markers files are available in FASTA and Excel format with links to GBrowse. 

Downloads

Prunus genetic marker sequences in peach (FASTA file) Prunus_map_markers_sequences.fasta
Prunus genetic marker sequences in peach: SSRs only (FASTA file) Prunus_map_markers_sequences_SSRs.fasta
Prunus genetic markers (Excel) GDR_markers_gbrowse.xls
RosCOS Markers aligned to genome (GFF3) RosCos_vs_Peach.markers.gff3.gz