|Analysis Name||Rubus idaeus 'Joan J' Genome v2.0 Assembly & Annotation |
|Method||Canu, Racon, and Pilon |
|Source||DNA-seq from Nanopore, PacBio, and Illumina (submitted to SRA with BioProject ID PRJNA869453) |
|Date performed||2022-09-15 |
Zhou, Junhui, Li, Muzi, Li, Yongping, Xiao, Yuwei, Luo, Xi, Gao, Shenglan, Sadowski, Norah, Timp, Winston, Mount, Stephen M., and Liu, Zhongchi. Comparative analyses of red raspberry and strawberry fruit development reveal diverse mechanisms for different fruit types (Submitted).
Canu was applied to assembly the Nanopore reads into draft contigs. The contigs were first polished by Racon using Nanopore and PacBio reads, and then corrected by Pilon using Illumina reads. A haplotype-fused assembly was constructed by the Purge Haplotigs pipeline using the Nanopore reads. Afterwards, Nanopore reads were further utilized to construct super scaffolds by SSPACE-LongRead, and to fill gaps by GapFinisher. The resulting genome assembly was corrected by Pilon again using Illumina reads. Finally, the pseudochromosomes were constructed by ALLMAPS.
LTR-retriever and RepeatModeler were used to build de novo repeat library, which was later fed into RepeatMasker for repeat annotation and masking. The genome annotation was performed using a combination of ab initio gene models, transcript evidence (RNA-Seq data), and protein homology-based evidence. The potential gene models in the repeat-masked genomes were predicted by MAKER, AUGUSTUS, and BRAKER2. EVM was employed to generate the confident consensus gene models based on the gene models produced by the three predictors, the transcript evidence, and the protein evidence using nonstochastic weight values. Subsequently, the EVM models was improved by PASA first, and then manually curated by Apollo.
The final genome assembly has a total length of 297,436,202 bp with scaffold N50 of 33.74 Mb. 33,865 protein-coding genes were identified in the genome with a BUSCO completeness score of 96.5%.
The Rubus idaeus 'Joan J' Genome v2.0 assembly file is available in FASTA format.
The Rubus idaeus 'Joan J' v2.0 genome gene prediction files are available in FASTA and GFF3 formats.