Malus x domestica HFTH1 Whole Genome v1.0
Liyi Zhang, Jiang Hu, Xiaolei Han, Jingjing Li , Yuan Gao, Christopher M. Richards , Caixia Zhang, Yi Tian, Guiming Liu, Hera Gul, Dajiang Wang, Yu Tian, Chuanxin Yang, Minghui Meng, Gaopeng Yuan, Guodong Kang, Yonglong Wu, Kun Wang, Hengtao Zhang, Depeng Wang & Peihua Cong. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nature communications. 2019 April 02.
About the Assembly
Assemble a high-quality genome (contig N50 of 6.99Mb) of the apple anther-derived homozygous line HFTH1 and reveal that the extensive genomic variations are largely attributable to activity of transposable elements.
Sequencing, Assembly, and Annotation
PacBio single-molecule long reads (77Gb with an average length of 13.1kb), 66-fold Illumina paired-end short reads (43.3Gb), 224-fold optical map data (147.8Gb with an average length of 178.9kb) and 145-fold Hi-C data . The assembly was performed in a stepwise fashion16, and the initial assembly of the PacBio-only data generated a 656.52Mb genome size with a contig N50 of 4.63Mb. The initial contigs were polished with PacBio long reads and Illumina short reads. Subsequently, the polished contigs were scaffolded using optical map data, and during this step four contigs containing conﬂicting connections were identiﬁed and split to resolve conﬂicts, and 58.5% gaps that were introduced in this step were closed by subsequent gap ﬁlling procedure. Finally, scaffolding with Hi-C data allowed the accurate clustering and ordering of 17 pseudo-chromosomes covering the 658.90Mb assembly, with a contig N50 of 6.99Mb and a maximum contig length of 18.01Mb. The assembly size was close to the estimated genome size of GDDH133, but represented 92.99% of our estimated genome size (708.54Mb) for HFTH1 by k-mer analysis, and ~97.89% of the Illumina reads of HFTH1 could be mapped to our assembly. In addition, the 160,068bp chloroplast genome and 396,939bp mitochondria genome were assembled into two complete contigs.
Homology of the Malus x domestica HFTH1 genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (TAIR10), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar. Each data type page will provide a description of the available files and links to download.
The Malus x domestica HFTH1 v1.0 genome gene prediction files are available in FASTA and GFF3 formats.
Functional annotation for the Malus x domestica HFTH1 genome v1.0 are available for download below. The Malus x domestica HFTH1 genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Malus x domestica genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.