Fragaria iinumae Genome v1.0 Assembly & Annotation

Overview


Analysis Name	Fragaria iinumae Genome v1.0 Assembly & Annotation
Method	PacBio, FALCON (v0.3.0)
Source	Fragaria iinumae Genome v1.0 Assembly & Annotation
Date performed	2019-10-31

Publication

Edger, P.P., McKain, M.R., Yocca, A.E. et al. Reply to: Revisiting the origin of octoploid strawberry. Nat Genet (2019) doi:10.1038/s41588-019-0544-2

Genome assembly and quality evaluation

Genome assembly was performed on PacBio long reads using FALCON v0.3.0(GitHub, 2018. Mar 18). Total genome coverage (~172X) before assembly was estimated by total bases from PacBio reads divided by the genome size (265.56 Mb) for F. iinumae. Error correction and preassembly were carried out with the FALCON pipeline after evaluating the outcomes of using different parameters in FALCON during the pre-assembly process. The draft genome, with a contig N50 of >10Mb, was polished with Arrow using all SMRT reads and polished using Pilon v1.22 using the Illumina reads (~107X coverage) with the default settings. A GC depth analysis was conducted to assess the potential contamination during sequencing and the coverage of the assembly. The completeness of the genome assembly was also evaluated using BUSCO (Benchmarking Universal Single Copy Orthologs) software. Previously a high-density linkage map of F. iinumae was constructed by 4173 markers, with 3280 from the Array and 893 from genotyping by sequencing. Here we anchored the contigs to this genetic map to obtain a chromosome-scale genome of F. iinumae.

Genome annotation

For repeat detection, four software packages, i.e., RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html), RepeatScout10 (http://www.repeatmasker.org), Piler (http://www.drive5.com/piler/), and LTR-Finder (http://tlife.fudan.edu.cn/ltr_finder), were used to build a de novo repeat library on the basis of our assembly with the default settings. To identify known transposable elements (TEs) in the genomes, RepeatMasker8 (http://www.repeatmasker.org) was used to screen the assembled genome against the Repbase v22.11 and Mips-REdat libraries. We constructed a de novo long terminal repeat retrotransposon (LTR-RT) library by scanning the assembled F. iinumae genome using LTRharvest15 (-motif tgca -motifmis 1) and LTR_Finder (LTR length 100-5000nt, length between two LTRs: 1000-20000nt). Homolog-based, de novo-based, and RNA-sequencing (RNA-seq)-based gene prediction methods were used in combination to identify the protein-coding genes in the F. iinumae genome assembly. For homology-based predictions, protein sequences of Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Fragaria vesca, and Malus domestica were used as the references. For de novo-based prediction, Augustus v2.4, GlimmerHMM v3.0.4, SNAP v2006, GeneID v1.4 and Genscan with default parameters were used for de novo-based gene prediction. All software was trained using the 1000 full-length genes from the homology-based predictions and Arabidopsis gene model before gene prediction (Supplementary Table 2 & 4). For the RNA-seq-based prediction, TransDecoder v2.0 (http://transdecoder.github.io.), GeneMarkS-T v5.1, and PASA v2.0.2 were used. Finally, the results from the three methods were integrated using EVM v1.1.1. All the genes were annotated by aligning to the Nucleotide collection (NR), Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG database release 84.0). Then, InterProScan24 package was used to annotate the predicted genes using the InterPro (5.21–60.0) database.

Homology

Homology of the Fragaria iinumae genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e^-9was used for the NCBI nr (Release 2018-05) and 1e^-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.

Protein Homologs

Fragaria iinumae v1.0 proteins with NCBI nr homologs (EXCEL file)	fiinumae-v1.0_vs_nr.xlsx.gz
Fragaria iinumae v1.0 proteins with NCBI nr (FASTA file)	fiinumae-v1.0_vs_nr_hit.fasta.gz
Fragaria iinumae v1.0 proteins without NCBI nr (FASTA file)	fiinumae-v1.0_vs_nr_noHit.fasta.gz
Fragaria iinumae v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file)	fiinumae-v1.0_vs_arabidopsis.xlsx.gz
Fragaria iinumae v1.0 proteins with arabidopsis (Araport11) (FASTA file)	fiinumae-v1.0_vs_arabidopsis_hit.fasta.gz
Fragaria iinumae v1.0 proteins without arabidopsis (Araport11) (FASTA file)	fiinumae-v1.0_vs_arabidopsis_noHit.fasta.gz
Fragaria iinumae v1.0 proteins with SwissProt homologs (EXCEL file)	fiinumae-v1.0_vs_swissprot.xlsx.gz
Fragaria iinumae v1.0 proteins with SwissProt (FASTA file)	fiinumae-v1.0_vs_swissprot_hit.fasta.gz
Fragaria iinumae v1.0 proteins without SwissProt (FASTA file)	fiinumae-v1.0_vs_swissprot_noHit.fasta.gz
Fragaria iinumae v1.0 proteins with TrEMBL homologs (EXCEL file)	fiinumae-v1.0_vs_trembl.xlsx.gz
Fragaria iinumae v1.0 proteins with TrEMBL (FASTA file)	fiinumae-v1.0_vs_trembl_hit.fasta.gz
Fragaria iinumae v1.0 proteins without TrEMBL (FASTA file)	fiinumae-v1.0_vs_trembl_noHit.fasta.gz

Downloads

All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar. Each data type page will provide a description of the available files and links do download.

Assembly

The Fragaria iinumae Genome v1.0 assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file)

fiinumae-v1.0.fasta.gz

Gene Predictions

The Fragaria iinumae v1.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences (FASTA file)	fiinumae-v1.0.proteins.fasta.gz
CDS (FASTA file)	fiinumae-v1.0.CDs.fasta.gz
Genes (GFF3 file)	fiinumae-v1.0.genes.gff3.gz

Functional Analysis

Functional annotation for the Fragaria iinumae genome v1.0 are available for download below. The Fragaria iinumae genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan	fiinumae-v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan	fiinumae-v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs	fiinumae-v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways	fiinumae-v1.0_KEGG-pathways.xlsx.gz

Transcript Alignments

Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Fragaria iinumae genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

Fragaria x ananassa GDR RefTrans v1	Fragaria iinumae_v1.0_f.x.ananassa_GDR_reftransV1
Malus_x_domestica GDR RefTrans v1	Fragaria iinumae_v1.0_m.x.domestica_GDR_reftransV1
Prunus avium GDR RefTrans v1	Fragaria iinumae_v1.0_p.avium_GDR_reftransV1
Prunus persica GDR RefTrans v1	Fragaria iinumae_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1	Fragaria iinumae_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2	Fragaria iinumae_v1.0_rubus_GDR_reftransV2

Links

View Assembly in JBrowse

BLAST

View Synteny