Prunus persica Chinese Cling Whole Genome v1.0 Assembly & Annotation

Overview
Analysis NamePrunus persica Chinese Cling Whole Genome v1.0 Assembly & Annotation
MethodSMARTDenovo, WTDBG2 (na)
SourcePrunus persica Illumina, PacBio, and Hi-C Reads
Date performed2020-11-26

Publication

Cao K, Yang X, Li Y, Zhu G, Fang W, Chen C, Wang X, Wu J, Wang L. New high-quality peach (Prunus persica L. Batsch) genome assembly to analyze the molecular evolutionary mechanism of volatile compounds in peach fruits.. The Plant journal : for cell and molecular biology. 2021 Jul 26.  https://doi.org/10.1111/tpj.15439

Background

Peach is an important economic tree in the world. Although a high-quality genome has been published in peach, the accession used to assemble the genome is a wild resource, ‘Lovell’. Here, we report a chromosome-level genome assembly and sequence analysis of an important parent material for breeding program worldwide, ‘Chinese Cling’, by a combination of high-throughput illumina sequencing, Single Molecule, Real-Time Sequencing, and High-through chromosome conformation capture technology.

Genome facts and statistics

A total of 107 × coverage of Illumina short reads and 176 × coverage of Pacbio single-molecule long reads were sequenced and assembled to get a high quality reference genome with 247.33 megabases (Mb). The genome size covered 99.8% of estimated genome (249.8 Mb) by k-mer analysis. The contig N50 resulted in of 4.13 Mb. To improve the scaffold assembly, a high-throughput chromatin conformation capture (Hi-C) technology was used with 121 × coverage data (30.21 Gb). Finally, scaffolding with Hi-C data allowed the accurate clustering and ordering of 8 pseudo-chromosomes covering the genome resulted to a scaffold N50 of 29.68 Mb. The quality and completeness of the assembly genome were evaluated using the GC-Depth distribution, Illumina data mapping, and BUSCO genes analysis. Overall, the above information confirmed the assembly had a high completeness.

Approximately 114.66 Mb transposable elements (TEs) were identified in ‘Chinese Cling’ genome. Combing with RNA-seq results, we identified 26,335 high-confidence protein-coding genes. In addition, by comparing with known Non-coding RNA libraries, we annotated rRNA, snRNA, miRNA and tRNA in ‘Chinese Cling’ genome. Among the 26,335 genes, a total of 97.3% of them can be annotated to known proteins.

Homology

Homology of the Prunus persica Chinese Cling genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Prunus persica Chinese Cling v1.0 proteins with NCBI nr homologs (EXCEL file) Pp_chinesecling_v1.0_vs_nr.xlsx.gz
Prunus persica Chinese Cling v1.0 proteins with NCBI nr (FASTA file) Pp_chinesecling_v1.0_vs_nr_hit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins without NCBI nr (FASTA file) Pp_chinesecling_v1.0_vs_nr_noHit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Pp_chinesecling_v1.0_vs_arabidopsis.xlsx.gz
Prunus persica Chinese Cling v1.0 proteins with arabidopsis (Araport11) (FASTA file) Pp_chinesecling_v1.0_vs_arabidopsis_hit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins without arabidopsis (Araport11) (FASTA file) Pp_chinesecling_v1.0_vs_arabidopsis_noHit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins with SwissProt homologs (EXCEL file) Pp_chinesecling_v1.0_vs_swissprot.xlsx.gz
Prunus persica Chinese Cling v1.0 proteins with SwissProt (FASTA file) Pp_chinesecling_v1.0_vs_swissprot_hit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins without SwissProt (FASTA file) Pp_chinesecling_v1.0_vs_swissprot_noHit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins with TrEMBL homologs (EXCEL file) Pp_chinesecling_v1.0_vs_trembl.xlsx.gz
Prunus persica Chinese Cling v1.0 proteins with TrEMBL (FASTA file) Pp_chinesecling_v1.0_vs_trembl_hit.fasta.gz
Prunus persica Chinese Cling v1.0 proteins without TrEMBL (FASTA file) Pp_chinesecling_v1.0_vs_trembl_noHit.fasta.gz

 

Download

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.

Assembly

The Prunus persica Chinese Cling Genome v1.0 assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Prunus_persica_ChineseCling_v1.0.fasta.gz

 

Gene Predictions

The Prunus persica Chinese Cling v1.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences  (FASTA file) Prunus_persica_ChineseCling_v1.0.proteins.fasta.gz
Genes (GFF3 file) Prunus_persica_ChineseCling_v1.0.genes.gff3.gz

 

Functional Analysis

Functional annotation for the Prunus persica Chinese Cling Genome v1.0 are available for download below. The Prunus persica Chinese Cling Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan Pp_chinesecling_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Pp_chinesecling_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Pp_chinesecling_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Pp_chinesecling_v1.0_KEGG-pathways.xlsx.gz