Pyrus communis Genome v1.0 Draft Assembly & Annotation
From this page you can browse and download the draft whole genome sequence, predicted gene models, functional annotations, and more from the Pyrus communis (European pear) draft genome assembly v1.0. Select a link in the Resources box for further details.
For this assembly, DNA was extracted from leaves of P. communis trees grown at Plant & Food Research in New Zealand and Foundation-Istituto Agrario di San Michele all’Adige in Italy. Two paired-end libraries were constructed forming libraries with 2kb and 7kb inserters respectively. Sequencing was performed using the GS FLX+series with the GS FLX Titanium Sequencing Kit XL+, and the resulting reads were assembled using Roche's 454 assembler (version 2.7) generating 142,083 scaffolds greater than 499bp. This assembly represents 577.3Mb of the estimated 600Mb genome with the largest scafold at 1.2Mb. Using this assembly, scaffolds were masked using RepeatMasker (using the rosid clade from RepBase), and the genes were predicted using as strategy employing Augustus, GeneWise, RNA-seq alignments and homology to other predicted Rosaceae genes. De Novo repeats were identified using RepeatScout.
Chagné D, Crowhurst RN, Pindo M, Thrimawithana A, Deng C, Ireland H, Fiers M, Dzierzon H, Cestaro A, Fontana P, Bianco L, Lu A, Storey R, Knäbel M, Saeed M, Montanari S, Kim YK, Nicolini D, Larger S, Stefani E, Allan AC, Bowen J, Harvey I, Johnston J, Malnoy M, Troggio M, Perchepied L, Sawyer G, Wiedow C, Won K, Viola R, Hellens RP, Brewer L, Bus VG, Schaffer RJ, Gardiner SE, Velasco R. The Draft Genome Sequence of European Pear (Pyrus communis L. 'Bartlett'). PloS one. 2014; 9(4):e92644.
Additional Annotations by the GDR Team
The predicted genes from this assembly were further annotated with InterPro domains, GO Terms, KEGG pathways and orthologs by the GDR Team using InterProScan and the KEGG KAAS online service.
All assembly and annotation files are available for download by selecting the desired data type in the right-hand side bar. Each data type page will provide a description of the available files and links do download.
Sequencing was performed using the GS FLX+series with the GS FLX Titanium Sequencing Kit XL+, and the resulting reads were assembled using Roche's 454 assembler (version 2.7) generating 142,083 scaffolds greater than 499bp. This assembly represents 577.3Mb of the estimated 600Mb genome with the largest scafold at 1.2Mb. Repeat masked scaffolds are also provided below. See the 'Predicted Repeats' link for further information about identification of repeats.
Genes were predicted using a hybrid approach. First RNA-seq reads generated with this project were assembled into transcript contigs using trans-ABySS. These transcript contigs were then used as a training set for de novo predictions using the tool Augustus. Augustus predictions were generated for both the repeat-masked and unmasked scaffold sequences. Next, predicted proteins from other Rosaceae species (e.g. apple, Chinese pear, peach and strawberry) were aligned to the repeat-masked scaffolds using tblastn. GeneWise was then used to predict gene models in regions of matched homology where matches were 79% identical or greater. Gene models from GenWise were reviewed using Evigene and only the best models retained. Finally, gene models were retained if they appeared in both the Augustus set and the GenWise set and were present in the same repeat-masked scaffold regions.
Below, the final hybrid gene set for this v1.0 assembly are available in GFF3 and FASTA format. Additionally, Augustus gene models are available. These models were used for creation of the final hybrid set.
The following functional annotation files were derived from processing the hybrid gene set through InterProScan and the KEGG/KAAS services. Genes are mapped to InterPro domains, GO terms, KEGG pathways and orthologs. This work was performed by the GDR team.
De novo repeats were identified using RepeatScout and refined to remove redundancy using TEClass. RepeatMasker was then used to identify putative repeats using as input the de novo repeats and RepBase.
SNPs from the Illumina Infinium II IRSC SNP array v1 for apple and pear were mapped to the assembled scaffolds, their positions can be found in the GFF3 files and flanking sequence in FASTA (.fna) files below.