Crataegus pinnatifida var. major Genome v1.0 Assembly & Annotation
Zhang Ticao, Qiaoqin, Du Xiao, Zhang Xiao, Hou Yali, Wei Xin, Sun Chao, Zhang Rengang, Yun Quanzheng, Crabbe M. James, Van De Peer Yves, Dong Wenxuan. The cultivated hawthorn (Crataegus pinnatifida var. major) genome sheds light on the evolution of Maleae (apple tribe). Accepted to Journal of Integrative Plant Biology. 2022.
Cultivated hawthorn (Crataegus pinnatifida var. major) is an important medicinal and edible plant that has a long history of uses for health protection in China. Herein, we provide a de novo chromosome-level genome sequence of the hawthorn cultivar ‘Qiu Jinxing’. We assembled a 823.41 Mb genome encoding 40,571 genes and further anchored the 779.24 Mb sequence into 17 pseudo-chromosomes, which accounts for 94.64% of the assembled genome. Phylogenomic analyses revealed that cultivated hawthorn diverged from the combined clades of Malus and Pyrus at approximately 11.8 Mya. Notably, the genes involved in flavonoid and triterpenoid biosynthetic pathways have been significantly amplified in the hawthorn genome. In addition, our results indicated that the Maleae (apple tribe) share a unique ancient tetraploid event; however, no recent independent whole-genome duplication event was specifically detected in hawthorn. The ampliﬁcation of long terminal repeat retrotransposons (e.g., Ty3/gypsy) had contributed the most to the expansion of the hawthorn genome. Furthermore, we identified two paleo-sub-genomes in extant species of Maleae and found these two sub-genomes showed different rearrangement mechanisms. The ancestral chromosomes of Rosaceae were reconstructed and the paleo-polyploid origin of Maleae is discussed. Overall, our study provides an improved context for understanding the evolution of Maleae and this new high-quality reference genome provides a useful resource for horticultural improvement of hawthorn.
Homology of the Crataegus pinnatifida var. major Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format.
The Crataegus pinnatifida var. major v1.0 genome gene prediction files are available in FASTA and GFF3 formats.
Functional annotation for the Crataegus pinnatifida var. major Genome v1.0 are available for download below. The Crataegus pinnatifida Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Crataegus pinnatifida genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.