Rubus occidentalis Whole Genome Assembly v1.0 & Annotation v1

Analysis NameRubus occidentalis Whole Genome Assembly v1.0 & Annotation v1
SourceIllumina paired-end reads
Date performed2016-03-18

About the Assembly


Black raspberry (Rubus occidentalis L.), is a minor but important specialty fruit crop in the United States Pacific Northwest prized for its unique flavor and potential health benefits. Black raspberry is diploid (2n = 2x = 14) and belongs to the same subgenus (Idaeobatus) as red raspberry (R. idaeus L.), with which it can be readily crossed. The global commercial raspberry production exceeds 500,000 metric tons. Since the early 1900s, black raspberry production in the U.S. has seen a marked decline that many attribute to disease pressures and a lack of cultivars with sufficient resistance. Black raspberry cultivars suffer from limited genetic diversity stemming from the narrow gene pool used in the elite germplasm and lack of breeding progress. Black raspberry genomic resources will be useful for Rubus breeding programs and comparative genomics within the Rosaceae.

Genome facts and statistics

The black raspberry accession ORUS 4115-3 was chosen for sequencing because of its low residual within-genome heterozygosity and apparent tolerance to Verticillium wilt (Verticillium dahliae Kleb.), a soil-borne fungal disease and leading cause of stand decline in commercial fields in Oregon. The black raspberry genome was sequenced using eight Illumina paired-end libraries with inserts ranging in size from 165 base pairs (bp) to 4,700 bp collectively representing 325x coverage of the estimated 293 megabase (Mb) genome. The final ALLPATHS  based assembly includes 9,245 contiguous sequences (contigs) in 2,226 scaffolds spanning 243 Mb or 83% of the estimated genome. The scaffold N50 length is 353 kilobase pairs (kb) with half of the assembly contained in the largest 178 scaffolds. The black raspberry scaffolds were assembled into seven pseudomolecules using a high-density genetic map derived from an F1 population of 115 plants. In total, 626 scaffolds were anchored to the seven black raspberry pseudochromosomes collectively spanning 203 Mb or 84.5% of the assembly.

MAKER was used to annotate the black raspberry genome. A set of 33,783 reference-guided and 71,622 de novo-assembled transcripts were generated using RNA sequencing (RNA-seq) reads generated from tissue of young leaf, Verticillium inoculated and un-inoculated root, green fruit, red fruit, ripe fruit, and cane tissue. These transcripts were clustered into 29,460 representative sequences that were used for input into MAKER producing a preliminary set of 32,300 putative gene models. Gene models with no functional annotation based on InterProScan and BLASTP were removed leaving a final set of 28,005 protein coding genes.

More information about the genome assembly can be found in the publication (VanBuren R, Bryant D, Bushakra JM, Vining KJ, Edger PP, Rowley ER, Priest HD, Michael TP, Lyons E, Filichkin SA, Dossett M, Finn CE, Bassil NV, Mockler TC. The genome of black raspberry (Rubus occidentalis). The Plant Journal. 2016; 87(6):535-547.)


Homology of the Rubus occidentalis v1.0.a1 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. The results are available for download in Excel format. An expectation value cutoff less than 1e-6 was used for Arabidoposis proteins and 1e-9  for the NCBI nr, Uniprot SwissProt, and Uniprot TrEMBL databases.


Protein Homologs

Black raspberry proteins with NCBI nr homologs Rubus_occidentalis_v1.0.a1_vs_nr.xlsx
Black raspberry proteins with Arabidopsis homologs Rubus_occidentalis_v1.0.a1_vs_arabidopsis.xlsx
Black raspberry proteins with Swiss-Prot homologs Rubus_occidentalis_v1.0.a1_vs_swissprot.xlsx
Black raspberry proteins with TrEMBL homologs Rubus_occidentalis_v1.0.a1_vs_trembl.xlsx



All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar.  Each data type page will provide a description of the available files and links to download.


The Rubus occidentalis v1.0.a1 genome assembly files are available in FASTA and GFF3 formats. There are a total of 2,226 scaffolds in this assembly.


Scaffolds (FASTA file) Rubus occidentalis_v1.0.a1.scaffolds.fasta.gz
Scaffolds (GFF3file) Rubus occidentalis_v1.0.a1.scaffolds.gff3.gz


Gene Predictions

The Rubus occidentalis v1.0.a1 genome gene prediction files are available in FASTA and GFF3 formats.


Transcript CDS sequences (FASTA file) Rubus_occidentalis_v1.0.a1.transcripts.fasta.gz
Protein sequences  (FASTA file) Rubus_occidentalis_v1.0.a1.proteins.fasta.gz
Genes, CDS, 5' UTR, 3'UTR locations (GFF3 file) Rubus_occidentalis_v1.0.a1.genes.gff3.gz


Functional Analysis

Functional annotation for the Rubus occidentalis v1.0.a1 genome are available for download below. The Rubus occidentalis proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


Gene functions annotated by InterProScan Rubus_occidentalis_v1.0.a1_functions.txt.gz
GO assignments from InterProScan Rubus_occidentalis_v1.0.a1_genes2GO.txt.gz
IPR assignments from InterProScan Rubus_occidentalis_v1.0.a1_genes2IPR.txt.gz
KEGG Hierarchy file (for viewing with KegHeir) Rubus_occidentalis_v1.0.a1_KEGG-hier.tar.gz
Proteins mapped to KEGG Orthologs Rubus_occidentalis_v1.0.a1_KEGG-orthologis.txt.gz
Proteins mapped to KEGG Pathways Rubus_occidentalis_v1.0.a1_KEGG-pathways.txt.gz