Prunus persica Chinese Cling Whole Genome v1.0 Assembly & Annotation
A high-quality peach genome assembly to analyze the molecular evolutionary mechanism of volatile compounds in peach fruits (to be submitted by Cao Ke at Zhengzhou fruit research institute, CAAS, China).
Peach is an important economic tree in the world. Although a high-quality genome has been published in peach, the accession used to assemble the genome is a wild resource, ‘Lovell’. Here, we report a chromosome-level genome assembly and sequence analysis of an important parent material for breeding program worldwide, ‘Chinese Cling’, by a combination of high-throughput illumina sequencing, Single Molecule, Real-Time Sequencing, and High-through chromosome conformation capture technology.
Genome facts and statistics
A total of 107 × coverage of Illumina short reads and 176 × coverage of Pacbio single-molecule long reads were sequenced and assembled to get a high quality reference genome with 247.33 megabases (Mb). The genome size covered 99.8% of estimated genome (249.8 Mb) by k-mer analysis. The contig N50 resulted in of 4.13 Mb. To improve the scaffold assembly, a high-throughput chromatin conformation capture (Hi-C) technology was used with 121 × coverage data (30.21 Gb). Finally, scaffolding with Hi-C data allowed the accurate clustering and ordering of 8 pseudo-chromosomes covering the genome resulted to a scaffold N50 of 29.68 Mb. The quality and completeness of the assembly genome were evaluated using the GC-Depth distribution, Illumina data mapping, and BUSCO genes analysis. Overall, the above information confirmed the assembly had a high completeness.
Approximately 114.66 Mb transposable elements (TEs) were identified in ‘Chinese Cling’ genome. Combing with RNA-seq results, we identified 26,335 high-confidence protein-coding genes. In addition, by comparing with known Non-coding RNA libraries, we annotated rRNA, snRNA, miRNA and tRNA in ‘Chinese Cling’ genome. Among the 26,335 genes, a total of 97.3% of them can be annotated to known proteins.
Homology of the Prunus persica Chinese Cling genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
All assembly and annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar. Each data type page will provide a description of the available files and links to download. Alternatively, you can browse all available files on the GDR data repository.
The Prunus persica Chinese Cling v1.0 genome gene prediction files are available in FASTA and GFF3 formats.
Functional annotation for the Prunus persica Chinese Cling Genome v1.0 are available for download below. The Prunus persica Chinese Cling Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).