Pyrus communis Genome v1.0 Draft Assembly & Annotation

Overview
Analysis NamePyrus communis Genome v1.0 Draft Assembly & Annotation
MethodRoche GS De Novo Assembler (2.7)
SourceGS FLX+ sequencing of 2k and 7k paired-end shotgun libraries
Date performed2013-02-19

From this page you can browse and download the draft whole genome sequence, predicted gene models, functional annotations, and more from the Pyrus communis (European pear) draft genome assembly v1.0. Select a link in the Resources box for further details.

Assembly Overview

For this assembly, DNA was extracted from leaves of P. communis trees grown at  Plant & Food Research in New Zealand and Foundation-Istituto Agrario di San Michele all’Adige in Italy.   Two paired-end libraries were constructed forming libraries with 2kb and 7kb inserters respectively.  Sequencing was performed using the GS FLX+series with the GS FLX Titanium Sequencing Kit XL+, and the resulting reads were assembled using Roche's 454 assembler (version 2.7) generating 142,083 scaffolds greater than 499bp.  This assembly represents 577.3Mb of the estimated 600Mb genome with the largest scafold at 1.2Mb.  Using this assembly, scaffolds were masked using RepeatMasker (using the rosid clade from RepBase), and the genes were predicted using as strategy employing Augustus, GeneWise, RNA-seq alignments and homology to other predicted Rosaceae genes. De Novo repeats were identified using RepeatScout. 

Publications

Chagné D, Crowhurst RN, Pindo M, Thrimawithana A, Deng C, Ireland H, Fiers M, Dzierzon H, Cestaro A, Fontana P, Bianco L, Lu A, Storey R, Knäbel M, Saeed M, Montanari S, Kim YK, Nicolini D, Larger S, Stefani E, Allan AC, Bowen J, Harvey I, Johnston J, Malnoy M, Troggio M, Perchepied L, Sawyer G, Wiedow C, Won K, Viola R, Hellens RP, Brewer L, Bus VG, Schaffer RJ, Gardiner SE, Velasco R. The Draft Genome Sequence of European Pear (Pyrus communis L. 'Bartlett'). PloS one. 2014; 9(4):e92644.

Additional Annotations by the GDR Team

The predicted genes from this assembly were further annotated with InterPro domains, GO Terms, KEGG pathways and orthologs by the GDR Team using InterProScan and the KEGG KAAS online service.

 

 

Downloads

All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar.  Each data type page will provide a description of the available files and links do download.

Assembly

Sequencing was performed using the GS FLX+series with the GS FLX Titanium Sequencing Kit XL+, and the resulting reads were assembled using Roche's 454 assembler (version 2.7) generating 142,083 scaffolds greater than 499bp.  This assembly represents 577.3Mb of the estimated 600Mb genome with the largest scafold at 1.2Mb. Repeat masked scaffolds are also provided below. See the 'Predicted Repeats' link for further information about identification of repeats.

Downloads

Scaffolds (FASTA file) Pyrus_communis_v1.0-scaffolds.fna.gz
Scaffolds (GFF3 file) Pyrus_communis_v1.0-scaffolds.gff3.gz
Contigs (GFF3 file) Pyrus_communis_v1.0-contigs.gff3.gz
Repeat masked scaffolds, masked with N (FASTA file) Pyrus_communis_v1.0-scaffolds_N_masked.fna.gz
Repeat masked scaffolds, masked with X (FASTA file) Pyrus_communis_v1.0-scaffolds_X_masked.fna.gz

 

Gene Predictions

Genes were predicted using a hybrid approach.  First RNA-seq reads generated with this project were assembled into transcript contigs using trans-ABySS.  These transcript contigs were then used as a training set for de novo predictions using the tool Augustus.  Augustus predictions were generated for both the repeat-masked and unmasked scaffold sequences.   Next, predicted proteins from other Rosaceae species (e.g. apple, Chinese pear, peach and strawberry) were aligned to the repeat-masked scaffolds using tblastn. GeneWise was then used to predict gene models in regions of matched homology where matches were 79% identical or greater.  Gene models from GenWise were reviewed using Evigene and only the best models retained.   Finally, gene models were retained if they appeared in both the Augustus set and the GenWise set and were present in the same repeat-masked scaffold regions.

Below, the final hybrid gene set for this v1.0 assembly are available in GFF3 and FASTA format. Additionally, Augustus gene models are available.  These models were used for creation of the final hybrid set.

Downloads

Hybrid gene predictions (GFF3) Pyrus_communis_v1.0-genes_hybrid.gff3.gz
Hybrid mRNA (unspliced) nucleotide sequences (FASTA) Pyrus_communis_v1.0-mRNA_hybrid.fna.gz
Hybrid protein sequences  (FASTA) Pyrus_communis_v1.0-proteins_hybrid.faa.gz
Augustus gene predictions (GFF3) Pyrus_communis_v1.0-genes_augustus.gff3.gz
Augustus mRNA (unspliced) nucleotide seuences (FASTA) Pyrus_communis_v1.0-mRNA_augustus.fna.gz

 

Functional Annotations

The following functional annotation files were derived from processing the hybrid gene set through InterProScan and the KEGG/KAAS services.  Genes are mapped to InterPro domains, GO terms, KEGG pathways and orthologs. This work was performed by the GDR team.

Downloads

Predicted genes mapped to KEGG Pathways Pyrus_communis_v1.0-genes2KEGG_pathways.txt
Predicted genes mapped to KEGG Orthologs Pyrus_communis_v1.0-genes2KEGG_orthologs.txt
Predicted genes mapped to IPR domains Pyrus_communis_v1.0-genes2IPR.txt
Predicted genes mapped to GO terms Pyrus_communis_v1.0.genes2GO.txt
KEGG Hierarchy file (for viewing with KegHeir) Pyrus_communis_v1.0-KEGG_hier.tar.gz
KEGG map files Pyrus_communis_v1.0-KEGG_map.tar.gz
InterProScan raw output file Pyrus_communis_v1.0-interpro.raw.gz
InterProScan XML output file Pyrus_communis_v1.0-interpro.xml.gz

 

Predicted Repeats

De novo repeats were identified using RepeatScout and refined to remove redundancy using TEClass.  RepeatMasker was then used to identify putative repeats using as input the de novo repeats and RepBase.  

Downloads

RepeatMasker repeats (GFF3 file) Pyrus_communis_v1.0-RM_repeats.gff3.gz
RepeatMasker Summary (TXT file) Pyrus_communis_v1.0-RM_summary.txt
Repeat masked scaffolds, masked with N (FASTA file) Pyrus_communis_v1.0-scaffolds_N_masked.fna.gz
Repeat masked scaffolds, masked with X (FASTA file) Pyrus_communis_v1.0-scaffolds_X_masked.fna.gz

 

NcRNA

Predicted tRNAs were identified using tRNAScanSE

Downloads

tRNAScanSE predicted tRNAs Pyrus_communis_v1.0-ncRNA.gff3.gz

 

SNPs

SNPs from the Illumina Infinium II IRSC SNP array v1 for apple and pear were mapped to the assembled scaffolds, their positions can be found in the GFF3 files and flanking sequence in FASTA (.fna) files below.

Downloads

Apple Infinium SNPs mapped to P. communis v1.0 Pyrus_communis_v1.0-SNPs_NCBI_apple.fna.gz
Apple Infinium SNPs mapped to P. communis v1.0 Pyrus_communis_v1.0-SNPs_NCBI_apple.gff3.gz
Pear Infinium SNPs mapped to P. communis v1.0 Pyrus_communis_v1.0-SNPs_NCBI_pear.fna.gz
Pear Infinium SNPs mapped to P. communis v1.0 Pyrus_communis_v1.0-SNPs_NCBI_pear.gff3.gz