Prunus cerasus cv. 'Montmorency' haploid Whole Genome v1.0 Assembly & Annotation
Three collections of young leaf tissue were made in the spring of 2019 for PacBio sequencing, Illumina sequencing of a gDNA library, and Illumina sequencing of a Hi-C library. Canu was used to assemble approximately 100X PacBio long-reads into contigs. Subsequently, 56X coverage of Illumina 150bp paired-end gDNA reads were aligned to the assembly using bowtie2, and these alignments were used for polishing the initial assembly with Pilon. After polishing, the assembly was scaffolded using Illumina 150bp paired-end Hi-C reads and the Juicer + 3D-DNA pipeline. Misassemblies in the scaffolded chromosome-scale sequences were corrected in JuiceBox Assembly Tools. For annotation, Illumina 150bp paired-end RNA-sequencing was conducted on a variety of tissues from cultivar 'Montmorency' at different developmental time points to aid in gene discovery. In addition to the RNA-seq, Nanopore cDNA-sequencing was conducted on a diverse pool of tissues in an attempt to obtain full-length transcripts for annotation. Manually-curated protein databases were downloaded from Uniprot and arabidopsis.org and aligned to the assembly with Exonerate to make homology-based gene predictions. The RNA-seq and cDNA-sequencing data were aligned to the assembly using STAR and minimap2, respectively, and transcriptomes from both types of data were created using Stringtie. The two transcriptomes and protein alignments were given to MAKER for evidence-based gene predictions. After MAKER's first run, the resulting .gff3 was used to train gene finders Augustus and SNAP. MAKER was run a second time using these evidence-trained gene finders and the subsequent gene predictions were filtered so that each prediction contained at least one known Pfam domain. Then, deFusion was run to identify erroneously fused gene candidates, and more than 2500 gene predictions on the 16 main chromosomes of the assembly were defused as a result of this pipeline. At this point, predictions were again filtered to contain Pfam domains but also to exclude predictions with homology to transposeable elements. Lastly, Apollo was used to manually annotate the intron-exon structure of 67 genes, including all identified DAM (Dormancy Associated MADS-box) genes.
4. Tentative title: 'The genome of the segmental allotetraploid sour cherry (Prunus cerasus) for genomics-assisted breeding' Authors: Goeckeritz, C. Z., Rhoades, K. B., Childs, K., Iezzoni, A. F., VanBuren, R., Hollender, C. A.
The Prunus cerasus Montmorency v1.0 genome gene prediction files are available in FASTA and GFF3 formats.