Fragaria x ananassa Yanli Genome v1.0 Assembly & Annotation

Overview
Analysis NameFragaria x ananassa Yanli Genome v1.0 Assembly & Annotation
MethodHifiasm (0.16.1)
SourcePacbio,HiFi and Illumina for Yanli
Date performed2023-01-26

Publication: Mao JX, Wang Y, Wang BT, Li JQ, Zhang C, Zhang WS, Li X, Li J, Zhang JX, Li H, Zhang ZH. High-quality haplotype-resolved genome assembly of cultivated octoploid strawberry. Horticulture Research. Volume 10, Issue 1, January 2023, uhad002, doi: 10.1093/hr/uhad002

Materials and Methods:

Genome assembly

Fragaria x ananassa ‘Yanli’ mature leaves were used for genomic DNA extraction. The quality and concentration of DNA was determined using 1% agarose gel electrophoresis and a Qubit 3.0 fluorometer. PacBio’s standard protocol was used to build SMRTbell target size libraries. The library was sequenced using PacBio Sequel II with primer V2 and Sequel II binding kit 2.0. The de novo assembly was performed using Hifiasm version 0.16.1.

Hi-C assembly

The Illumina HiSeq X Ten platform was used to construct the Hi-C library by anchoring configs onto the chromosome. Qubit 2.0 and Agilent 2100 were used to determine the concentration and insert size. HiCUP was used to process sequence data generated by Hi-C and 3d-DNA was used to assist assembly of genome. After the Hi-C interaction heatmap matrix was constructed by Juicer version 1.5.6, mis-joins, order, and orientation were corrected by JuiceBox version 1.11.08. And the reads were aligned to the genome by Bowtie 2.

Annotation process

A combination of homologue prediction, de novo prediction, and RNA-seq/EST prediction was used to annotate protein-coding genes in the ‘Yanli’ genome. Sequences of Prunus avium, F. vesca, Malus × domestica, Rosa chinensis, and F. × ananassa and Exonerate version 2.2.0 were used for predicting homologous genes. De novo prediction was measured using AUGUSTUS version 3.3.2 and GlimmerHMM version 3.0.4. Splice junctions between exons were identified in the RNA-seq data using TopHat version 2.0.4 and assembled into transcripts using Cufflinks version 2.2.1. All the predictions made using the three methods were combined using MAKER2 version 2.31.10 to generate non-redundant and more complete gene sets. HiCESAP was used for obtaining final reliable gene sets.

 
Table 2   Statistics of two haplotype assemblies

Assembly level

Name

Hap1

Hap2

Contig assembly

Contigs number

628

278

 

Assembly size (bp)

824,838,780

808,073,877

 

N50

26.70

27.51

Scaffold assembly

Scaffold number

647

316

 

Assembly size (bp)

824,841,180

808,074,877

 

N50

27.31

27.51

Chromosomes Number

 

28

28

Unanchored Number

 

619

288

Total Length (Mb)

 

825

808

Chromosome anchoring rate (%)

 

95.01

96.29

Table S15. BUSCO assessment of annotation

Type

Haplotype 1

Haplotype 2

Proteins

Percentage (%)

Proteins

Percentage (%)

Complete BUSCOs

1,607

99.6

1,607

99.6

Complete Single-Copy BUSCOs

45

2.8

40

2.5

Complete Duplicated BUSCOs

1,562

96.8

1,567

97.1

Fragmented BUSCOs

0

0.0

1

0.1

Missing BUSCOs

7

0.4

6

0.3

Total BUSCO groups searched

1,614

100.0

1,614

100.0

 
Homology

Homology of the Fragaria x ananassa Yanli Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Fragaria x ananassa v1.0 proteins with NCBI nr homologs (EXCEL file) Fxananassa_Yanli_v1.0_vs_nr.xlsx.gz
Fragaria x ananassa v1.0 proteins with NCBI nr (FASTA file) Fxananassa_Yanli_v1.0_vs_nr_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without NCBI nr (FASTA file) Fxananassa_Yanli_v1.0_vs_nr_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) Fxananassa_Yanli_v1.0_vs_arabidopsis.xlsx.gz
Fragaria x ananassa v1.0 proteins with arabidopsis (Araport11) (FASTA file) Fxananassa_Yanli_v1.0_vs_arabidopsis_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without arabidopsis (Araport11) (FASTA file) Fxananassa_Yanli_v1.0_vs_arabidopsis_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with SwissProt homologs (EXCEL file) Fxananassa_Yanli_v1.0_vs_swissprot.xlsx.gz
Fragaria x ananassa v1.0 proteins with SwissProt (FASTA file) Fxananassa_Yanli_v1.0_vs_swissprot_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without SwissProt (FASTA file) Fxananassa_Yanli_v1.0_vs_swissprot_noHit.fasta.gz
Fragaria x ananassa v1.0 proteins with TrEMBL homologs (EXCEL file) Fxananassa_Yanli_v1.0_vs_trembl.xlsx.gz
Fragaria x ananassa v1.0 proteins with TrEMBL (FASTA file) Fxananassa_Yanli_v1.0_vs_trembl_hit.fasta.gz
Fragaria x ananassa v1.0 proteins without TrEMBL (FASTA file) Fxananassa_Yanli_v1.0_vs_trembl_noHit.fasta.gz

 

Assembly

The Fragaria x ananassa Yanli Genome v1.0 assembly file is available in FASTA format.

Downloads

Contigs (FASTA file) Fxananassa_Yanli_v1.0.fasta.gz

 

Gene Predictions

The Fragaria x ananassa Yanli v1.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Protein sequences  (FASTA file) Fxananassa_Yanli_v1.0.proteins.fasta.gz
CDS  (FASTA file) Fxananassa_Yanli_v1.0.cds.fasta.gz
CDS (Hap1, FASTA file) Fxananassa_Yanli_v1.0.hap1.cds.fasta.gz
CDS (Hap2, FASTA file) Fxananassa_Yanli_v1.0.hap2.cds.fasta.gz
Genes (GFF3 file) Fxananassa_Yanli_v1.0.genes.gff3.gz
Genes (Hap1, GFF3 file) Fxananassa_Yanli_v1.0.hap1.genes.gff3.gz
Genes (Hap2, GFF3 file) Fxananassa_Yanli_v1.0.hap2.genes.gff3.gz
Functional Analysis

Functional annotation for the Fragaria x ananassa Yanli Genome v1.0 are available for download below. The Fragaria x ananassa Yanli Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan Fxananassa_Yanli_v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan Fxananassa_Yanli_v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs Fxananassa_Yanli_v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways Fxananassa_Yanli_v1.0_KEGG-pathways.xlsx.gz

 

Transcript Alignments
Transcript alignments were performed by the GDR Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the Fragaria x ananassaYanli genome assembly. Alignments with an alignment length of 97% and 97% identify were preserved. The available files are in GFF3 format.

 

Fragaria x ananassa GDR RefTrans v1 Fxananassa_Yanli_v1.0_f.x.ananassa_GDR_reftransV1
fragaria avium GDR RefTrans v1 Fxananassa_Yanli_v1.0_p.avium_GDR_reftransV1
fragaria persica GDR RefTrans v1 Fxananassa_Yanli_v1.0_p.persica_GDR_reftransV1
Rosa GDR RefTrans v1 Fxananassa_Yanli_v1.0_rosa_GDR_reftransV1
Rubus GDR RefTrans v2 Fxananassa_Yanli_v1.0_rubus_GDR_reftransV2
Malus_x_domestica GDR RefTrans v1 Fxananassa_Yanli_v1.0_m.x.domestica_GDR_reftransV1
Pyrus GDR RefTrans v1 Fxananassa_Yanli_v1.0_pyrus_GDR_reftransV1