Prunus dulcis Texas Genome v3.0 Assembly & Annotation
Overview
Publication Castanera, R., de Tomás, C., Ruggieri, V., Vicient, C., Eduardo, I., Aranzana, M. J., Arús, P., & Casacuberta, J. M. (2024). A phased genome of the highly heterozygous 'Texas' almond uncovers patterns of allele-specific expression linked to heterozygous structural variants. Horticulture Research. https://doi.org/10.1093/hr/uhae106 AbstractThe vast majority of traditional almond varieties are self-incompatible and the level of variability of the species is very high, resulting in a highly heterozygosity genome. Therefore, information on the different haplotypes is particularly relevant to understand the genetic basis of trait variability in this species. However, although reference genomes for several almond varieties exist, none of them is phased and has genome information at the haplotype level. Here we present a phased assembly of genome of the almond cv. Texas. This new assembly has 13 % more assembled sequence than the previous version of the Texas genome and has an increased contiguity, in particular in repetitive regions such as the centromeres. Our analysis shows that the “Texas” genome has a high degree of heterozygosity, both at SNPs, short indels, and structural variants (SV) level. Many of the SVs are the result of heterozygous Transposable Element (TE) insertions, and in many cases they also contain genic sequences. In addition to the direct consequences of this genic variability on the presence/absence of genes, our results show that variants located close to genes are often associated with allele-specific gene expression (ASE), which highlights the importance of heterozygous SVs in almond. Table 1. Genome assembly and annotation statistics (The F0 gene annotation is a liftoff from the F1 phase, which was annotated de novo)
* e-value < 0.05 | FDR < 5%. Homology
Homology of the Prunus dulcis Texas genome v3.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6 for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. Protein Homologs
Assembly
The P. dulcis Texas genome v3.0 assembly file is available in FASTA format. The F0 gene annotation is a liftoff from the F1 phase, which was annotated de novo. Downloads
Gene Predictions
The P. dulcis Texas genome v3.0.a1 gene prediction files are available in GFF3. The F0 gene annotation is a liftoff from the F1 phase, which was annotated de novo. Downloads
Functional Analysis
Functional annotation for the Prunus dulcis Texas genome v3.0 are available for download below. The P. dulcis Texas genome v3.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS). Downloads
Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Prunus dulcis Texas v3.0 genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3.
|