Malus fusca Genome v1.1 Assembly & Annotation

Overview
Analysis NameMalus fusca Genome v1.1 Assembly & Annotation
MethodHifiasm (v.0.16.1)
SourceMalus fusca PacBio HiFi reads
Date performed2023-02-01

*The whole genome data was initially made public with v1.0 but was later changed to v1.1 with the publication. The only difference between the two versions is that the TE annotation column has been added to the gff files.

Publication

Mansfeld BN, Yocca A, Ou S, Harkess A, Burchard E, Gutierrez B, van Nocker S, Gottschalk C. A haplotype resolved chromosome-scale assembly of North American wild apple Malus fusca and comparative genomics of the fire blight Mfu10 locus. The Plant journal : for cell and molecular biology. 2023 Aug 28https://onlinelibrary.wiley.com/doi/10.1111/tpj.16433

Description

The Pacific crabapple (Malus fusca) is a wild relative of commercial apple (Malus x domestica). With a range extending from Alaska to Northern California, M. fusca is extremely hardy and disease resistant, which represents an untapped genetic resource for many biotic and abiotic stresses that may improve apple breeding as well as rootstock development. However, gene discovery and utilization of M. fusca has been hampered by the lack of genomic resources. Here, we present a high-quality, haplotype-resolved, chromosome-scale genome assembly and annotation of M. fusca. The genome was assembled using high-fidelity long-reads and scaffolded using genetic maps and high-throughput chromatin conformation sequencing, resulting in one of the most contiguous apple genomes to date. We annotated the genome using public transcriptomic data of the same species from diverse tissue types and developmental stages.

Genome assembly statistics

Assembly Length (bp) Number of contigs N50 Contigs >1 Mb

% of the

estimated size

Merqury

QV

Merqury QV

error rate

Merqury

k-mer completeness

% BUSCO

Complete

single copy Busco

Complete

duplicate Busco

Fragmented Busco Missing Busco
HiFi Contigs                          
Haplotype 1 682,253,989 1141 18,869,149 48 98.31% 62.39 5.77E-07 86.82%          
Haplotype 2 644,255,151 273 21,444,431 41 92.83% 67.14 1.93E-07 86.51%          
Combined 1,326,509,140 1414       64.09 3.90E-07 98.21%          
Scaffolds                          
Haplotype 1 682,257,189 1109 36,508,756 18 98.31%                
Haplotype 2 644,257,951 245 36,111,528 17 92.84%                
Decontaminated assembly                          
Haplotype 1 651,182,355 235 36,779,757 18 93.83%       98.70% 993 600 11 10
Haplotype 2 637,756,398 97 36,111,528 17 91.90%       98.82% 992 603 11 8
Chromosome-only assembly                        
Haplotype 1 637,459,684 17 36,779,757 17 91.86%       98.76% 996 598 11 9
Haplotype 2 631,922,950 17 36,111,528 17 91.06%       98.88% 994 602 11 7
Homology

Homology of the Malus fusca Genome v1.1 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

Downloads

Malus fusca v1.1 proteins with NCBI nr homologs (EXCEL file) Mfusca_v1.1_vs_nr.xlsx.gz
Malus fusca v1.1 proteins with NCBI nr (FASTA file) Mfusca_v1.1_vs_nr_hit.fasta.gz
Malus fusca v1.1 proteins without NCBI nr (FASTA file) Mfusca_v1.1_vs_nr_noHit.fasta.gz
Malus fusca v1.1 proteins with arabidopsis (Araport11) homologs (EXCEL file) Mfusca_v1.1_vs_arabidopsis.xlsx.gz
Malus fusca v1.1 proteins with arabidopsis (Araport11) (FASTA file) Mfusca_v1.1_vs_arabidopsis_hit.fasta.gz
Malus fusca v1.1 proteins without arabidopsis (Araport11) (FASTA file) Mfusca_v1.1_vs_arabidopsis_noHit.fasta.gz
Malus fusca v1.1 proteins with SwissProt homologs (EXCEL file) Mfusca_v1.1_vs_swissprot.xlsx.gz
Malus fusca v1.1 proteins with SwissProt (FASTA file) Mfusca_v1.1_vs_swissprot_hit.fasta.gz
Malus fusca v1.1 proteins without SwissProt (FASTA file) Mfusca_v1.1_vs_swissprot_noHit.fasta.gz
Malus fusca v1.1 proteins with TrEMBL homologs (EXCEL file) Mfusca_v1.1_vs_trembl.xlsx.gz
Malus fusca v1.1 proteins with TrEMBL (FASTA file) Mfusca_v1.1_vs_trembl_hit.fasta.gz
Malus fusca v1.1 proteins without TrEMBL (FASTA file) Mfusca_v1.1_vs_trembl_noHit.fasta.gz

 

Assembly

The Malus fusca Genome v1.1 assembly files are available in GFF3 and FASTA format.

Downloads

Chromosomes (Hap1) (FASTA file) Mfusca_v1.1_hap1,fasta.gz
TEs (Hap1) (GFF3 file) Mfusca_v1.1_hap1_TE.gff3.gz
Chromosomes (Hap2) (FASTA file) Mfusca_v1.1_hap2.fasta.gz
TEs (Hap2) (GFF3 file) Mfusca_v1.1_hap2_TE.gff3.gz

*The whole genome data was initially made public with v1.0 but was later changed to v1.1 with the publication. The only difference between the two versions is that the TE annotation column has been added to the gff files.

Gene Predictions

The Malus fusca v1.1 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (Hap1) (GFF3 file) Mfusca_v1.1_hap1.genes.gff3.gz
Genes (Hap2) (GFF3 file) Mfusca_v1.1_hap2.genes.gff3.gz
Protein sequences (Hap1) (FASTA file) Mfusca_v1.1_hap1.pep.fasta.gz
Protein sequences (Hap2) (FASTA file) Mfusca_v1.1_hap2.pep.fasta.gz
CDS sequences (Hap1) (FASTA file) Mfusca_v1.1_hap1.cds.fasta.gz
CDS sequences (Hap2) (FASTA file) Mfusca_v1.1_hap2.cds.fasta.gz

*The whole genome data was initially made public with v1.0 but was later changed to v1.1 with the publication. The only difference between the two versions is that the TE annotation column has been added to the gff files.

Functional Analysis

Functional annotation for the Malus fusca Genome v1.1 are available for download below. The Malus fusca Genome v1.1 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan Mfusca_v1.1_genes2GO.xlsx.gz
IPR assignments from InterProScan Mfusca_v1.1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Mfusca_v1.1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Mfusca_v1.1_KEGG-pathways.xlsx.gz

 

Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Malus fusca genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

 

Fragaria x ananassa GDR RefTrans v1 Mfusca_v1.1_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 Mfusca_v1.1_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 Mfusca_v1.1_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Mfusca_v1.1_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Mfusca_v1.1_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Mfusca_v1.1_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Mfusca_v1.1_pyrus_GDR_reftransV1