Potentilla anserina Genome v1.0 Assembly & Annotation

Overview
Analysis NamePotentilla anserina Genome v1.0 Assembly & Annotation
MethodFALCON (0.2.2)
SourcePacbio reads
Date performed2022-12-15

Publication

Gan, X.; Li, S.; Zong, Y.; Cao, D.; Li, Y.; Liu, R.; Cheng, S.; Liu, B.; Zhang, H. Chromosome-Level Genome Assembly Provides New Insights into Genome Evolution and Tuberous Root Formation of Potentilla anserina. Genes 2021, 12, 1993. https://doi.org/10.3390/genes12121993

Abstract

Potentilla anserina is a perennial stoloniferous plant with edible tuberous roots in Rosaceae, served as important food and medicine sources for Tibetans in the Qinghai-Tibetan Plateau (QTP), China, over thousands of years. However, a lack of genome information hindered the genetic study. Here, we presented a chromosome-level genome assembly using single-molecule long-read sequencing, and the Hi-C technique. The assembled genome was 454.28 Mb, containing 14 chromosomes, with contig N50 of 2.14 Mb. A total of 46,495 protein-coding genes, 169.74 Mb repeat regions, and 31.76 Kb non-coding RNA were predicted. P. anserina diverged from Potentilla micrantha ∼28.52 million years ago (Mya). Furthermore, P. anserina underwent a recent tetraploidization ∼6.4 Mya. The species-specific genes were enriched in Starch and sucrose metabolism and Galactose metabolism pathways. We identified the sub-genome structures of P. anserina, with A sub-genome was larger than B sub-genome and closer to P. micrantha phylogenetically. Despite lacking significant genome-wide expression dominance, the A sub-genome had higher homoeologous gene expression in shoot apical meristem, flower and tuberous root. The resistance genes was contracted in P. anserina genome. Key genes involved in starch biosynthesis were expanded and highly expressed in tuberous roots, which probably drives the tuber formation. The genomics and transcriptomics data generated in this study advance our understanding of the genomic landscape of P. anserina, and will accelerate genetic studies and breeding programs.

Summary of Potentilla anserina genome assembly statistics

Aassembled length

Number of chromosomes

N50

Protein-coding genes

Repeat regions

Non-coding RNA
454.28 Mb 14 2.14 Mb 46,495 169.74 Mb 31.76 Kb

 

Supplementary Table S4. Statistics of completeness validation of genome assembly. 

BUSCO eudicotyledons_odb10 (genome mode)

Validation of assembly quality 

Number

Percentage (%)

Total BUSCO groups

2326

100

Complete BUSCOs

2288

98.3

Fragmented BUSCOs

7

0.3

Missing BUSCOs

31

1.4

Homology

Homology of the Potentilla anserina Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Potentilla anserina v1.0 proteins with NCBI nr homologs (EXCEL file) poa_v1.0_vs_nr.xlsx.gz
Potentilla anserina v1.0 proteins with NCBI nr (FASTA file) poa_v1.0_vs_nr_hit.fasta.gz
Potentilla anserina v1.0 proteins without NCBI nr (FASTA file) poa_v1.0_vs_nr_noHit.fasta.gz
Potentilla anserina v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) poa_v1.0_vs_arabidopsis.xlsx.gz
Potentilla anserina v1.0 proteins with arabidopsis (Araport11) (FASTA file) poa_v1.0_vs_arabidopsis_hit.fasta.gz
Potentilla anserina v1.0 proteins without arabidopsis (Araport11) (FASTA file) poa_v1.0_vs_arabidopsis_noHit.fasta.gz
Potentilla anserina v1.0 proteins with SwissProt homologs (EXCEL file) poa_v1.0_vs_swissprot.xlsx.gz
Potentilla anserina v1.0 proteins with SwissProt (FASTA file) poa_v1.0_vs_swissprot_hit.fasta.gz
Potentilla anserina v1.0 proteins without SwissProt (FASTA file) poa_v1.0_vs_swissprot_noHit.fasta.gz
Potentilla anserina v1.0 proteins with TrEMBL homologs (EXCEL file) poa_v1.0_vs_trembl.xlsx.gz
Potentilla anserina v1.0 proteins with TrEMBL (FASTA file) poa_v1.0_vs_trembl_hit.fasta.gz
Potentilla anserina v1.0 proteins without TrEMBL (FASTA file) poa_v1.0_vs_trembl_noHit.fasta.gz

 

Assembly

The Potentilla anserina Genome v1.0 assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) P.anserina_v1.0.a1.fasta.gz

 

Gene Predictions

The Potentilla anserina v1.0 genome gene prediction files are available in GFF3 and FASTA.

Downloads

Gene (GFF3 file) P.anserina_v1.0.a1.gene.gff3.gz
Protein sequence (FASTA file) P.anserina_v1.0.a1.protein.fasta.gz
Transcript sequence (FASTA file) P.anserina_v1.0.a1.transcript.fasta.gz

 

Functional Analysis

Functional annotation for the Potentilla anserina Genome v1.0 are available for download below. The Potentilla anserina Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan poa_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan poa_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs poa_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways poa_v1.0_KEGG-pathways.xlsx.gz

 

Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Potentilla anserina genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

 

Fragaria x ananassa GDR RefTrans v1 poa_v1.0_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 poa_v1.0_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 poa_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 poa_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 poa_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 poa_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 poa_v1.0_pyrus_GDR_reftransV1