Rubus idaeus 'Joan J' Genome v2.0 Assembly & Annotation

Overview
Analysis NameRubus idaeus 'Joan J' Genome v2.0 Assembly & Annotation
MethodCanu, Racon, and Pilon
SourceDNA-seq from Nanopore, PacBio, and Illumina (submitted to SRA with BioProject ID PRJNA869453)
Date performed2022-09-15

Publication

Zhou, J., Li, M., Li, Y., Xiao, Y., Luo, X., Gao, S., Ma, Z., Sadowski, N., Timp, W., Dardick, C., & [et al.]. (2023). Comparison of red raspberry and wild strawberry fruits reveals mechanisms of fruit type specification. Plant Physiology, kiad409. https://doi.org/10.1093/plphys/kiad409.

Genome assembly
Canu was applied to assembly the Nanopore reads into draft contigs. The contigs were first polished by Racon using Nanopore and PacBio reads, and then corrected by Pilon using Illumina reads. A haplotype-fused assembly was constructed by the Purge Haplotigs pipeline using the Nanopore reads. Afterwards, Nanopore reads were further utilized to construct super scaffolds by SSPACE-LongRead, and to fill gaps by GapFinisher. The resulting genome assembly was corrected by Pilon again using Illumina reads. Finally, the pseudochromosomes were constructed by ALLMAPS.


Genome annotation
LTR-retriever and RepeatModeler were used to build de novo repeat library, which was later fed into RepeatMasker for repeat annotation and masking. The genome annotation was performed using a combination of ab initio gene models, transcript evidence (RNA-Seq data), and protein homology-based evidence. The potential gene models in the repeat-masked genomes were predicted by MAKER, AUGUSTUS, and BRAKER2. EVM was employed to generate the confident consensus gene models based on the gene models produced by the three predictors, the transcript evidence, and the protein evidence using nonstochastic weight values. Subsequently, the EVM models was improved by PASA first, and then manually curated by Apollo.

The final genome assembly has a total length of 297,436,202 bp with scaffold N50 of 33.74 Mb. 33,865 protein-coding genes were identified in the genome with a BUSCO completeness score of 96.5%.

Homology

Homology of the Rubus idaeus JoanJ Genome v2.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

Rubus idaeus v1.0 proteins with NCBI nr homologs (EXCEL file) Ridaeus_JoanJ_v2.0_vs_nr.xlsx.gz
Rubus idaeus v1.0 proteins with NCBI nr (FASTA file) Ridaeus_JoanJ_v2.0_vs_nr_hit.fasta.gz
Rubus idaeus v1.0 proteins without NCBI nr (FASTA file) Ridaeus_JoanJ_v2.0_vs_nr_noHit.fasta.gz
Rubus idaeus v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Ridaeus_JoanJ_v2.0_vs_arabidopsis.xlsx.gz
Rubus idaeus v1.0 proteins with arabidopsis (Araport11) (FASTA file) Ridaeus_JoanJ_v2.0_vs_arabidopsis_hit.fasta.gz
Rubus idaeus v1.0 proteins without arabidopsis (Araport11) (FASTA file) Ridaeus_JoanJ_v2.0_vs_arabidopsis_noHit.fasta.gz
Rubus idaeus v1.0 proteins with SwissProt homologs (EXCEL file) Ridaeus_JoanJ_v2.0_vs_swissprot.xlsx.gz
Rubus idaeus v1.0 proteins with SwissProt (FASTA file) Ridaeus_JoanJ_v2.0_vs_swissprot_hit.fasta.gz
Rubus idaeus v1.0 proteins without SwissProt (FASTA file) Ridaeus_JoanJ_v2.0_vs_swissprot_noHit.fasta.gz
Rubus idaeus v1.0 proteins with TrEMBL homologs (EXCEL file) Ridaeus_JoanJ_v2.0_vs_trembl.xlsx.gz
Rubus idaeus v1.0 proteins with TrEMBL (FASTA file) Ridaeus_JoanJ_v2.0_vs_trembl_hit.fasta.gz
Rubus idaeus v1.0 proteins without TrEMBL (FASTA file) Ridaeus_JoanJ_v2.0_vs_trembl_noHit.fasta.gz

 

Assembly

The Rubus idaeus 'Joan J' Genome v2.0 assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Ridaeus_JoanJ_v2.0.fasta.gz

 

Gene Predictions

The Rubus idaeus 'Joan J' v2.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences  (FASTA file) Ridaeus_JoanJ_v2.0.proteins.fasta.gz
Transcript sequences  (FASTA file) Ridaeus_JoanJ_v2.0.transcripts.fasta.gz
Genes (GFF3 file) Ridaeus_JoanJ_v2.0.genes.gff3.gz

 

Functional Analysis

Functional annotation for the Rubus idaeus JoanJ Genome v2.0 are available for download below. The Rubus idaeus Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan ridaeus_JoanJ_v2.0_genes2GO.xlsx.gz
IPR assignments from InterProScan ridaeus_JoanJ_v2.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs ridaeus_JoanJ_v2.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways ridaeus_JoanJ_v2.0_KEGG-pathways.xlsx.gz

 

Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Rubus idaeus genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

 

Fragaria x ananassa GDR RefTrans v1 ridaeus_JoanJ_v2.0_f.x.ananassa_GDR_reftransV1
Prunus avium GDR RefTrans v1 ridaeus_JoanJ_v2.0_p.avium_GDR_reftransV1
Rosa GDR RefTrans v1 ridaeus_JoanJ_v2.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 ridaeus_JoanJ_v2.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 ridaeus_JoanJ_v2.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 ridaeus_JoanJ_v2.0_pyrus_GDR_reftransV1