Prunus dulcis Texas Genome v3.0 Assembly & Annotation

Overview
Analysis NamePrunus dulcis Texas Genome v3.0 Assembly & Annotation
MethodFalcon (na)
SourcePacBio reads and Hi-C reads
Date performed2024-04-16

Citation
Castanera, R. (2024). A phased genome of the highly heterozygous 'Texas' almond uncovers patterns of allele-specific expression linked to heterozygous structural variants [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10829948

Abstract

The vast majority of traditional almond varieties are self-incompatible and the level of variability of the species is very high, resulting in a highly heterozygosity genome. Therefore, information on the different haplotypes is particularly relevant to understand the genetic basis of trait variability in this species. However, although reference genomes for several almond varieties exist, none of them is phased and has genome information at the haplotype level. Here we present a phased assembly of genome of the almond cv. Texas. This new assembly has 13 % more assembled sequence than the previous version of the Texas genome and has an increased contiguity, in particular in repetitive regions such as the centromeres. Our analysis shows that the “Texas” genome has a high degree of heterozygosity, both at SNPs, short indels, and structural variants (SV) level. Many of the SVs are the result of heterozygous Transposable Element (TE) insertions, and in many cases they also contain genic sequences. In addition to the direct consequences of this genic variability on the presence/absence of genes, our results show that variants located close to genes are often associated with allele-specific gene expression (ASE), which highlights the importance of heterozygous SVs in almond.

Table 1. Genome assembly and annotation statistics

Feature Texas v.3.0 Texas v.3.0 Texas v.2.0
  Phase 0 Phase 1  
Assembly length (Mb) 254.02 252.65 227.59
Pseudomolecule N50 (Mb) 30.53 30.47 24.8
Contig 362 362 4395
Contig L50 62 61 511
Contig N50 (Mb) 1.21 1.19 0.104
Max. Contig length (Mb) 7.01 7.01 1.31
Percent anchored to pseudomolecules 98 98 91.47
Gap (%) 0.01 0.01 1.72
LAI index 20.58 20.92 8.15
BUSCO complete genes (%) 96.9 97.7 95.4
BUSCO fragmented genes (%) 1.4 0.9 1
BUSCO missing genes (%) 1.7 140% 3.6
Numbe of protein-coding genes 28625 29616 27969
Genes with Pfam domain * 22892.(79%) 23413.(79%) 21582.(77%)
Gene density (genes/Mb) 113 117 123
Mean CDS length 1153 1122 1244
Mean exons per transcript 5.3 5.3 5.4

* e-value < 0.05 | FDR < 5%.

Homology

Homology of the Prunus dulcis Texas genome v3.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

P. dulcis Texas v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Pdulcis_Texas_v3.0_vs_arabidopsis.xlsx.gz
P. dulcis Texas v1.0 proteins with arabidopsis (Araport11) (FASTA file) Pdulcis_Texas_v3.0_vs_arabidopsis_hit.fasta.gz
P. dulcis Texas v1.0 proteins without arabidopsis (Araport11) (FASTA file) Pdulcis_Texas_v3.0_vs_arabidopsis_noHit.fasta.gz
P. dulcis Texas v1.0 proteins with SwissProt homologs (EXCEL file) Pdulcis_Texas_v3.0_vs_swissprot.xlsx.gz
P. dulcis Texas v1.0 proteins with SwissProt (FASTA file) Pdulcis_Texas_v3.0_vs_swissprot_hit.fasta.gz
P. dulcis Texas v1.0 proteins without SwissProt (FASTA file) Pdulcis_Texas_v3.0_vs_swissprot_noHit.fasta.gz
P. dulcis Texas v1.0 proteins with TrEMBL homologs (EXCEL file) Pdulcis_Texas_v3.0_vs_trembl.xlsx.gz
P. dulcis Texas v1.0 proteins with TrEMBL (FASTA file) Pdulcis_Texas_v3.0_vs_trembl_hit.fasta.gz
P. dulcis Texas v1.0 proteins without TrEMBL (FASTA file) Pdulcis_Texas_v3.0_vs_trembl_noHit.fasta.gz
Assembly

The P. dulcis Texas genome v3.0 assembly file is available in FASTA format.

Downloads

Chromosomes (Phase-0 FASTA file) pdulcis_Texas_v3.0.Phase-0.fasta.gz
Chromosomes (Phase-1 FASTA file) pdulcis_Texas_v3.0.Phase-1.fasta.gz
Gene Predictions

The P. dulcis Texas genome v3.0.a1 gene prediction files are available in GFF3.

Downloads

  GeneID conversion between Texas v3 and Texas v2 (TXT file) geneID_conversion_between_Texasv3AndTexasv2.txt.gz
Phase-0 Genes (GFF3 file) pdulcis_Texas_v3.0.a1.Phase-0.genes.gff3.gz
  Proteins (FASTA file) pdulcis_Texas_v3.0.a1.Phase-0.protein.fasta.gz
  CDS (FASTA file) pdulcis_Texas_v3.0.a1.Phase-0.CDS.fasta.gz
  Transcripts (FASTA file) pdulcis_Texas_v3.0.a1.Phase-0.transcript.fasta.gz
  TE (GFF3 file) pdulcis_Texas_v3.0.a1.Phase-0.TE.gff3.gz
  Structural variations (VCF file) pdulcis_Texas_v3.0.a1.Phase-0.SV.vcf.gz
Phase-1 Genes (GFF3 file) pdulcis_Texas_v3.0.a1.Phase-1.genes.gff3.gz
  Proteins (FASTA file) pdulcis_Texas_v3.0.a1.Phase-1.protein.fasta.gz
  CDS (FASTA file) pdulcis_Texas_v3.0.a1.Phase-1.CDS.fasta.gz
  Transcripts (FASTA file) pdulcis_Texas_v3.0.a1.Phase-1.transcript.fasta.gz
  TE (GFF3 file) pdulcis_Texas_v3.0.a1.Phase-1.TE.gff3.gz
  Structural variations (VCF file) pdulcis_Texas_v3.0.a1.Phase-1.Phase-1.SV.vcf.gz
Functional Analysis

Functional annotation for the Prunus dulcis Texas genome v3.0 are available for download below. The P. dulcis Texas genome v3.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan Pdulcis_Texas_v3.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Pdulcis_Texas_v3.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Pdulcis_Texas_v3.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Pdulcis_Texas_v3.0_KEGG-pathways.xlsx.gz
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Prunus dulcis Texas v3.0 genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3.

 

Fragaria x ananassa GDR RefTrans v1 Pdulcis_Texas_v3.0_f.x.ananassa_GDR_reftransV1
P. dulcis GDR RefTrans v1 Pdulcis_Texas_v3.0_p.avium_GDR_reftransV1
Prunus persica GDR RefTrans v1 Pdulcis_Texas_v3.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Pdulcis_Texas_v3.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Pdulcis_Texas_v3.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Pdulcis_Texas_v3.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Pdulcis_Texas_v3.0_pyrus_GDR_reftransV1