Rubus occidentalis Whole Genome v3.0 Assembly & Annotation

Analysis NameRubus occidentalis Whole Genome v3.0 Assembly & Annotation
MethodProximo Hi-C scaffolding pipeline
SourceIllumina NextSeq
Date performed2018-04-30



VanBuren,R., Wai,C.M., Colle,M., Wang,J., Sullivan,S., Bushakra,J.M., Liachko,I., Vining,K.J., Dossett,M., Finn,C.E., et al. (2018) A near complete, chromosome-scale assembly of the black raspberry (Rubus occidentalis) genome. GigaScience, 7. (GDR | Journal)



The fragmented nature of most draft plant genomes has hindered downstream gene discovery, trait mapping for breeding, and other functional genomics applications. There is a pressing need to improve or finish draft plant genome assemblies.

Here, we present a chromosome-scale assembly of the black raspberry genome using single-molecule real-time Pacific Biosciences sequencing and high-throughput chromatin conformation capture (Hi-C) genome scaffolding. The updated V3 assembly has a contig N50 of 5.1 Mb, representing an ∼200-fold improvement over the previous Illumina-based version. Each of the 235 contigs was anchored and oriented into seven chromosomes, correcting several major misassemblies. Black raspberry V3 contains 47 Mb of new sequences including large pericentromeric regions and thousands of previously unannotated protein-coding genes. Among the new genes are hundreds of expanded tandem gene arrays that were collapsed in the Illumina-based assembly. Detailed comparative genomics with the high-quality V4 woodland strawberry genome (Fragaria vesca) revealed near-perfect 1:1 synteny with dramatic divergence in tandem gene array composition. Lineage-specific tandem gene arrays in black raspberry are related to agronomic traits such as disease resistance and secondary metabolite biosynthesis.

The improved resolution of tandem gene arrays highlights the need to reassemble these highly complex and biologically important regions in draft plant genomes. The updated, high-quality black raspberry reference genome will be useful for comparative genomics across the horticulturally important Rosaceae family and enable the development of marker assisted breeding in Rubus.


Homology of the Rubus occidentalis v3.0 protein was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2017-07) and 1e-6  for the Arabidoposis proteins (TAIR10), UniProtKB/SwissProt (Release 2018-04), and UniProtKB/TrEMBL (Release 2018-04) databases. The best hit reports are available for download in Excel format. 


Protein Homologs

Rubus occidentalis v3.0 proteins with NCBI nr homologs (EXCEL file) rubus_occidentalis_v3.0_vs_nr.xlsx.gz
Rubus occidentalis v3.0 proteins with NCBI nr (FASTA file) rubus_occidentalis_v3.0_vs_nr_hit.fasta.gz
Rubus occidentalis v3.0 proteins without NCBI nr (FASTA file) rubus_occidentalis_v3.0_vs_nr_noHit.fasta.gz
Rubus occidentalis v3.0 proteins with arabidopsis (TAIR10) homologs (EXCEL file) rubus_occidentalis_v3.0_vs_tair.xlsx.gz
Rubus occidentalis v3.0 proteins with arabidopsis (TAIR10) (FASTA file) rubus_occidentalis_v3.0_vs_tair_hit.fasta.gz
Rubus occidentalis v3.0 proteins without arabidopsis (TAIR10) (FASTA file) rubus_occidentalis_v3.0_vs_tair_noHit.fasta.gz
Rubus occidentalis v3.0 proteins with SwissProt homologs (EXCEL file) rubus_occidentalis_v3.0_vs_swissprot.xlsx.gz
Rubus occidentalis v3.0 proteins with SwissProt (FASTA file) rubus_occidentalis_v3.0_vs_swissprot_hit.fasta.gz
Rubus occidentalis v3.0 proteins without SwissProt (FASTA file) rubus_occidentalis_v3.0_vs_swissprot_noHit.fasta.gz
Rubus occidentalis v3.0 proteins with TrEMBL homologs (EXCEL file) rubus_occidentalis_v3.0_vs_trembl.xlsx.gz
Rubus occidentalis v3.0 proteins with TrEMBL (FASTA file) rubus_occidentalis_v3.0_vs_trembl_hit.fasta.gz
Rubus occidentalis v3.0 proteins without TrEMBL (FASTA file) rubus_occidentalis_v3.0_vs_trembl_noHit.fasta.gz



All annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links do download.



Chromosome (FASTA file) Rubus occidentalis v3.0.fasta.gz


Gene Predictions

The Rubus occidentalis v3.0 gene prediction files are available in FASTA and GFF3 formats.


Genes (GFF3 file) Rubus occidentalis v3.0.genes.gff3.gz
CDS (FASTA file) Rubus occidentalis v3.0.cds.fa.gz
Proteins (FASTA file) Rubus occidentalis v3.0.proteins.fa.gz
Functional Analysis

Functional annotation for the Rubus occidentalis v3.0 genome are available for download below. The Rubus occidentalis proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan rubus_occidentalis_v3.0_genes2GO.xlsx.gz
IPR assignments from InterProScan rubus_occidentalis_v3.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs rubus_occidentalis_v3.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways rubus_occidentalis_v3.0_KEGG-pathways.xlsx.gz