Prunus persica Whole Genome v1.0 Assembly & Annotation
| Analysis Name | Prunus persica Whole Genome v1.0 Assembly & Annotation |
|---|---|
| Software | Arachne |
| Source | Prunus persica Sanger Reads |
| Date performed | 2012-01-16 |
| Materials & Methods | ReferenceThe International Peach Genome Initiative. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet. 2013 Mar 24. doi: 10.1038/ng.2586. [Epub ahead of print] BackgroundPeach (Prunus persica) is considered one of the genetically most well characterized species in the Rosaceae, and it has distinct advantages that make it suitable as a model genome species for Prunus as well as for other species in the Rosaceae. While some Prunus species, such as cultivated plums and sour cherries, are polyploid, peach is a diploid with n = 8 and has a comparatively small genome currently estimated to be ~220-230 Mbp based upon the peach v1.0 assembly. Peach has a relatively short juvenility period of 2-3 years compared to most other fruit tree species that require 6-10 years. In addition, a number of genes for fundamentally important traits have been genetically described in peach, including genes controlling flower and fruit development, tree growth habit, dormancy, cold hardiness, and disease and pest resistance. Genome facts and statisticsPeach v1.0 was generated from DNA from the doubled haploid cultivar ‘Lovell’ which means that the genes and intervening DNA is “fixed” or identical for all alleles and both chromosomal copies of the genome. This doubled haploid nature was confirmed by the evaluation of >200 SSRs, and has facilitated a highly accurate and consistent assembly of the peach genome. Additionally, access to the peach genome is provided by GDR, JGI Phytozome and IGA.
|
Prunus persica tools available on GDR
- View the Peach v1.0 assembly and annotations in GBrowse
- View the Peach v1.0 syntenic comparisions with Strawberry
- Blast your sequences against apple data using the GDR NCBI BLAST Server - results returned to your screen
- Blast your sequences against apple data using the GDR BATCH BLAST Server - results returned in an Excel file
Additional Prunus persica tools
- Visualize the v1.0 genome assembly and gene models in the Integrated Genome Browser (IGB) . You may load RNA-Seq, ChIP-Seq, tiling array data, etc.
The Prunux persica v1.0 genome assembly files are available in FASTA and GFF3 formats. There are a total of 202 scaffolds in this assembly of peach. The psuedomolecules corresponding to the eight chromosomes of peach are the first eight scaffolds of the assembly. In future releases these psuedomolecules will most likely be renamed but for now the pseudomolecules are named scaffold_1, scaffold_2, scaffold_3, etc.
Downloads
| Scaffolds (FASTA file) | Prunus_persica_v1.0_scaffolds.fa.gz |
| Scaffolds (GFF3 file) | Prunus_persica_v1.0_scaffolds.gff3.gz |
| Pseudomolecules (first 8 scaffolds) (FASTA file) | Prunus_persica_v1.0_chr.fa.gz |
| Scaffolds masked for repeats (hard masking with N's) | Prunus_persica_v1.0_scaffolds.hardmasked.fa.gz |
| Scaffolds masked for repeats (soft masking with lower-case letters) | Prunus_persica_v1.0_scaffolds.softmasked.fa.gz |
After the initial assembly of the peach v1.0 genome, some large scaffolds were missing markers that allowed for their proper orientation and placement within the pseudomolecules. Further analysis was performed to locate markers on 10 scaffolds greater than 300kp in order to place them and orient them within the assembly. A refined assembly was then generated. This refined assembly will be coming in a future release of the peach genome, and is reported in the upcoming peach genome publication. For reference, the following JBrowse viewer is available to visualize changes to the assembly.
View assembly changes in JBrowse
The Prunus persica v1.0 genome gene prediction files are available in FASTA and GFF3 formats. An update on May 16, 2012 added Phytozome PACid's to the genes GFF3 file.
Downloads
| CDS sequences (FASTA file) | Prunus_persica_v1.0_CDS.fa.gz |
| Peptide sequences (FASTA file) | Prunus_persica_v1.0_peptide.fa.gz |
| mRNA (transcripts) (FASTA file) | Prunus_persica_v1.0_transcript.fa.gz |
| Genes, CDS, 5' UTR, 3'UTR locations (GFF3 file) | Prunus_persica_v1.0_genes.gff3.gz |
The Prunux persica v1.0 genome homology files are available for download in Excel formats with links to GBrowse and to external databases for matched homologs. All homology data was determined using the predicted peach gene transcripts (28,692 sequences) and NCBI blastx against various protein databases. An expectation value cutoff of 1e-6 was used. For EST alignments the NCBI Rosaceae and Genera EST databases were downloaded, and filtered for quality before blasting.
Protein Homologs
| predicted gene functions | Prunus_persica_v1.0_gene_function.xls |
| 24,423 peach gene transcripts with Arabidopsis homologs | Prunus_persica_v1.0_vs_arabidopsis.xls |
| 18,822 peach gene transcripts with Swiss-Prot homologs | Prunus_persica_v1.0_vs_swissprot.xls |
| 26,731 peach gene transcripts with TrEMBL homologs | Prunus_persica_v1.0_vs_trembl.xls |
Rosaceae EST alignments
| Apple ESTs | Prunus_persica_v1.0_vs_malus_ESTs.xls |
| Strawberry ESTs | Prunus_persica_v1.0_vs_fragaria_ESTs.xls |
| Prunus ESTs | Prunus_persica_v1.0_vs_prunus_ESTs.xls |
| Rosa ESTs | Prunus_persica_v1.0_vs_rosa_ESTs.xls |
| Rubus ESTs | Prunus_persica_v1.0_vs_rubus_ESTs.xls |
| Pyrus ESTs | Prunus_persica_v1.0_vs_pyrus_ESTs.xls |
The Prunux persica v1.0 genome markers files are available in FASTA and Excel format with links to GBrowse.
Downloads
| Prunus genetic marker sequences in peach (FASTA file) | Prunus_map_markers_sequences.fasta |
| Prunus genetic marker sequences in peach: SSRs only (FASTA file) | Prunus_map_markers_sequences_SSRs.fasta |
| Prunus genetic markers (Excel) | GDR_markers_gbrowse.xls |
| RosCOS Markers aligned to genome (GFF3) | RosCos_vs_Peach.markers.gff3.gz |
SNPs have been provided by several different groups.
-
The IRSC (International Rosaceae Sequencing Consortium) has mined SNPs for peach which are included on Illumina Infinium arrays. These SNPs are provided in Excel format below. Additionally, all available candidate SNPs are also available, as well as cherry SNPs that have been mapped to the peach genome.
-
SNPs used for two breeding populations Pop DF ('Dr. Davis' × 'F8, 1-42') and Pop DG ('Dr. Davis' × 'Georgia Belle').
Riaz Ahmad, Dan E Parfitt, Joseph Fass, Ebenezer Ogundiwin, Amit Dhingra, Thomas M Gradziel, Dawei Lin, Nikhil A Joshi, Pedro J Martinez-Garcia, and Carlos H Crisosto. Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection. BMC Genomics. 2011; 12:569
-
Using RNA-seq data from cherry cultivars SNPs were identifeid in 3'UTR regions of transcripts and then mapped to the peach genome. Those alignments (and all 3'UTR Cherry SNPs) are provided below in Excel and GFF formats
T Koepke, S Schaeffer, V Krishnan, D Jiwan, A Harper, M Whiting, N Oraguzie and A Dhingra. 2012 Rapid gene-based SNP and haplotype marker development in non-model eukaryotes using 3′UTR sequencing. BMC Genomics 13: 18.
Downloads
| IRSC 9K peach SNPs | IRSC_9K_peach_SNP_array.xls |
| IRSC 9K peach SNPs predicted primers | IRSC_array_peach_snps.gff.primer3.output.xls |
| IRSC 6K cherry SNPs mapped to peach | IRSC_6K_cherry_SNP_array.xls |
| IRSC 40K peach candidate SNPs | peach_candidate_snps.xls |
| IRSC 40K peach candidate SNPs predicted primers | peach_candidate_snps.gff.primer3.output.xls |
| UC Davis 6K SNPs | ucd6k_peach_snps.xls |
| Cherry 3'UTR SNPs aligned to Peach (Excel) | SweetCherrySNPs-3_UTR.xlsx |
| Cherry 3'UTR SNPs aligned to Peach (GFF3) | SweetCherrySNPs-3_UTR.gff3.gz |
The Prunus persica v1.0 genome genes were mapped to KEGG pathways and orthologs using the KEGG Automatic Annotation Server (KAAS). Resulting files are available for download below.
KEGG
| Mapping of KEGG orthologs to peach genes | Prunus_perisca_v1.0_KEGG.orthologs.txt |
| KEGG heirarchy file for viewing in KegHeir tool | Prunus_perisca_v1.0_KEGG.hier.tar.gz |
| KEGG pathway maps. Proteins mapped to peach genes are highlighted | Prunus_perisca_v1.0_KEGG.map.tar.gz |
The Prunux persica v1.0 genome repeat files are available in GFF3 formats. Repeats were predicted using the Repbase database, LTR Finder and ReAS prediction tools. A consenus file contains repeats from all three methods.
Downloads
| The consensus repeats from LTR Finder, Repbase and ReAS | Prunus_persica_v1.0_repeats_consensus.gff3.gz |
| Repats predicted using repeats from Rebpase | Prunus_persica_v1.0_repeats_Repbase.gff3.gz |
| Repeat predictions from LTR Finder | Prunus_persica_v1.0_repeats_LTR.gff3.gz |
| Repeat predictions from ReAS | Prunus_persica_v1.0_repeats_ReAS.gff3.gz |
| Scaffolds masked for repeats (hard masking with N's) | Prunus_persica_v1.0_scaffolds.hardmasked.fa.gz |
| Scaffolds masked for repeats (soft masking with lower-case letters) | Prunus_persica_v1.0_scaffolds.softmasked.fa.gz |