Prunus salicina Lindl cv. Wushancuili Genome v1.0 Assembly & Annotation

Overview
Analysis NamePrunus salicina Lindl cv. Wushancuili Genome v1.0 Assembly & Annotation
MethodHifiasm (v0.16.1)
SourceIllumina, PacBio HiFiⅢ, Hi-C for Wushancuili’s fruit
Date performed2023-05-01

Publication 

Zhou, K., Wang, J., Pan, L., Xiang, F., Zhou, Y., Xiong, W., Zeng, M., Grierson, D., Kong, W., Hu, L., & Xi, W. (2023). A chromosome-level genome assembly for Chinese plum 'Wushancuili' reveals the molecular basis of its fruit color and susceptibility to rain-cracking. Horticultural Plant Journal, https://doi.org/10.1016/j.hpj.2023.04.011

Description

Here we present a chromosome-level genome assembly of Prunus salicina Lindl cv. Wushancuili with the combination of PacBio sequencing, Illumina Sequencing and Hi-C technology. The assembly has a total size of 302.17 Mb, with contig N50 of 23,590,757bp and scaffold N50 of 33,711,648bp. 96.56% of the Prunus salicina Lindl cv. Wushancuili assembled sequences were anchored onto 8 pseudo-chromosomes. De novo, homology-based, and RNA-seq methods were used together to predict were used together to predict 25,304 protein-coding genes, 99.23% of which were functionally annotated. BUSCO analysis showed 98.95% complete genes. CEGMA assessment showed that our assembly captured 233 (93.95%) complete core genes.

Table 1. Summary of ‘Wushancuili’ genome assembly and annotation.

Parameter Value
Genome assembly size (bp) 302 171 150
Number of contigs (≥2 kb) 518
N50 contig length (bp) 23 590 757
Number of scaffolds 501
N50 scaffold length (bp) 33 711 648
GC content (%) 38.49%
CEGs (conserved core eukaryotic genes, %) 95.97%
BUSCOs (%) 98.95%
All repeat sequences (%) 54.17
TEs (transposable elements, %) 53.05
LTR (long terminal repeat, %) 45.60
Protein-coding gene number 25 304

 

Homology

Homology of the Prunus salicina Lindl cv. Wushancuili genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

Prunus salicina v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) psalicina_Wushancuili_v1.0_vs_arabidopsis.xlsx.gz
Prunus salicina v1.0 proteins with arabidopsis (Araport11) (FASTA file) psalicina_Wushancuili_v1.0_vs_arabidopsis_hit.fasta.gz
Prunus salicina v1.0 proteins without arabidopsis (Araport11) (FASTA file) psalicina_Wushancuili_v1.0_vs_arabidopsis_noHit.fasta.gz
Prunus salicina v1.0 proteins with SwissProt homologs (EXCEL file) psalicina_Wushancuili_v1.0_vs_swissprot.xlsx.gz
Prunus salicina v1.0 proteins with SwissProt (FASTA file) psalicina_Wushancuili_v1.0_vs_swissprot_hit.fasta.gz
Prunus salicina v1.0 proteins without SwissProt (FASTA file) psalicina_Wushancuili_v1.0_vs_swissprot_noHit.fasta.gz
Prunus salicina v1.0 proteins with TrEMBL homologs (EXCEL file) psalicina_Wushancuili_v1.0_vs_trembl.xlsx.gz
Prunus salicina v1.0 proteins with TrEMBL (FASTA file) psalicina_Wushancuili_v1.0_vs_trembl_hit.fasta.gz
Prunus salicina v1.0 proteins without TrEMBL (FASTA file) psalicina_Wushancuili_v1.0_vs_trembl_noHit.fasta.gz
Assembly

The Prunus salicina Genome v1.0 assembly files are available in FASTA and GFF3 formats.

Downloads

Chromosomes (FASTA file) Psalicina_Wushancuili_v1.0.fasta.gz
Repeats (GFF3 file) Psalicina_Wushancuili_v1.0.repeats.gff.gz

 

Gene Predictions

The Prunus salicina v1.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences  (FASTA file) Psalicina_Wushancuili_v1.0.proteins.fasta.gz
CDS  (FASTA file) Psalicina_Wushancuili_v1.0.cds.fasta.gz
Genes (GFF3 file) Psalicina_Wushancuili_v1.0.genes.gff3.gz

 

Functional Analysis

Functional annotation for the Prunus salicina Lindl cv. Wushancuili genome v1.0 are available for download below. The Prunus salicina genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan psalicina_Wushancuili_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan psalicina_Wushancuili_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs psalicina_Wushancuili_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways psalicina_Wushancuili_v1.0_KEGG-pathways.xlsx.gz
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Prunus salicina Lindl cv. Wushancuili genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.
Fragaria x ananassa GDR RefTrans v1 psalicina_Wushancuili_v1.0_f.x.ananassa_GDR_reftransV1
Prunus avium GDR RefTrans v1 psalicina_Wushancuili_v1.0_p.avium_GDR_reftransV1
Prunus persica GDR RefTrans v1 psalicina_Wushancuili_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 psalicina_Wushancuili_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 psalicina_Wushancuili_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 psalicina_Wushancuili_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 psalicina_Wushancuili_v1.0_pyrus_GDR_reftransV1