The Malus x domestica genome v1.0  pseudo haplotype assemblies are a set of four different assemblies (primary and alternates 1, 2 and 3)  derived from the contigs of the original v1.0 assembly.  These pseudo haplotypes are intended to divde overlapping contigs in the original assembly into four different consensus sequences representing the four different haplotypes present in the apple genome.  However, these assemblies are not true haplotypes, hence the name "pseudo haplotypes".  Currently, only the primary pseudo haplotype assembly is available on GDR.  Gaps of 200,000 N's were used to space scaffolds... [more]

Resource Titles: 
Assembly
Gene Predictions
Repeats
Downloads
Resource Blocks: 

Assembly files for the Malus x domestica genome v1.0 pseudo haplotypes are available in both GFF and FASTA format below.

For more information on the files available below, please see the description provided on the psuedo hapolotype assembly details page.

Downloads

Primary haplotype pseudomolecules (FASTA file) Malus_x_domestica.v1.0-primary.pseudo.fa.gz
Primary haplotype pseudomolecules (GFF file) Malus_x_domestica.v1.0-primary.pseudo.gff3.gz
Primary haplotype scaffold alignments (GFF file) Malus_x_domestica.v1.0-primary.scaffolds.gff3.gz
Primary haplotype scaffold sequences (FASTA file) Malus_x_domestica.v1.0-primary.scaffolds.fa.gz
Primary haplotype contig alignments (GFF file) Malus_x_domestica.v1.0-primary.contigs.gff3.gz
Repeat Masked pseudomolecules (FASTA file) Malus_x_domestica.v1.0-primary_masked.fa.gz

 

The gene predictions for the pseudo haplotype assemblies are the same consensus set from the original assembly, but have been mapped to the pseudomolecules of the haplotypes.

5' and 3' UTR regions are currently not available for gene models

Downloads

Consensus gene model CDSs (FASTA) Malus_x_domestica.v1.0-primary.CDS.fa.gz
Consensus gene model proteins (FASTA) Malus_x_domestica.v1.0-primary.protein.fa.gz
Consensus gene model mRNA (FASTA) Malus_x_domestica.v1.0-primary.mRNA.fa.gz
Consensus gene models (transcripts) (GFF3) Malus_x_domestica.v1.0-primary.transcripts.gff3.gz

 

Repeats were predicted for the original v1.0 combined assembly using read depth information from the genome assembly contigs.  The FASTA file of repeats was then used as a repeat library for RepeatMasker which was used to predict repeats on the v1.0 primary haplotype assembly.

Downloads

Predicted repeats aligned to chromosomes (GFF file) Malus_x_domestica.v1.0-primary.repeats.gff3.gz
Predicted repeats aligned to chromsomes (FASTA file) Malus_x_domestica.v1.0-primary.repeats.fa.gz

 

All assembly and annotation files are available for download by selecting the desired data type in the right-hand "Resources" side bar.  Each data type page will provide a description of the available files and links do download.

Resource Titles: 
Library Information
Homology
Microsatellite Analysis
eSNP Summary
Contact
Publication
Downloads

Many sequencing projects around the world are depositing ESTs from the genus Malus in the NCBI dbEST database. However, not all of these ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences

To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP3 1 program. For some sequences, we were able to obtain the o... [more]

Resource Blocks: 
The Malus ESTs used for this assembly were downloaded on December 19th, 2005

 

 EST Libraries
 Number of ESTs available  202,888
 # of Species  3
 # of Libraries  89
 # of Tissues  41
 # of Development Stages  31

View detailed chart of libraries.

 Species
 Malus sieboldii  1,163
 Malus x domestica  197,781
 Malus x domestica x Malus sieversii  3,944
 

 

Homology was determined using the BLASTx algorithm for the Malus Contigs and Singlets vs. the Swiss-Prot and TrEMBL databases. Only matches with an E-value of 1.0 e-9 or better were recorded as significant. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.

 Homology of Malus Contigs
 Number of Contigs  22,435
 Number (%) of Contigs with a Match in Swiss-Prot Database
Download Excel Spreadsheet
 12,401 (55.3%)
 Number (%) of Contigs with a Match in TrEMBL Database
 Download Excel Spreadsheet
 19,316 (86.1%)

 

 Homology of Malus Singlets
 Number of Singlets  45,556
 Number (%) of Singlets with a Match in Swiss-Prot Database
Download Excel Spreadsheet
 14,698 (32.3%)
 Number (%) of Singlets with a Match in TrEMBL Database
 Download Excel Spreadsheet
 20,673 (45.4%)
 

 

The type and frequency of simple sequence repeats in this unigene assembly (v2) were determined using the CUGIssr.pl program. For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times.

 

 Sequence information
 Number of Sequences  195553
 Number of Sequences Having One Or More SSRs  35504
 Percentage of Sequences Having One Or More SSRs  18.2%
 Total Number of SSRs Found  44472
 Number of Motifs  636

 

Frequency of Motif Type

 

 Motif Length  Frequency  Percentage Frequency
 2bp  20647  42.4%
 3bp  16846  37.9%
 4bp  5562  12.5%
 5bp  1417  3.2%
 

 

The type and frequency of single nucleotide polymorphisms in this unigene assembly (V2) were determined using the AutoSNP software package (Savage et al., 2005).

View autoSNP output:

 

 SNP Summary
 Number of Contigs  22,435
 Number of SNPs  10,426
 Consensus Size  19,101,680 bp
 SNP Frequency  0.05/100 bp
 Total Transistions  4,759
 Total Transversions  2,732
 Total Indels  2,935
 

 

 Contact Details
 Name  Main, Dorrie
 Lab  Department of Horticulture
 Organization  Washington State University
 Address  45 Johnson Hall, Pullman, WA 99164
 Telephone  509-335-2774
 Fax  509-335-8690
 Email  dorrie@wsu.edu

 

No publications are currently available.

Sequence Files:
Blast Result Files:
Microsatellite Files:

 

Resource Titles: 
Library Information
Homology
Contig GO Terms
Microsatellite Analysis
eSNP Summary
Contact
Publication
Download

Many sequencing projects around the world are depositing ESTs from the genus Malus in the NCBI dbEST database. All the Malus ESTs from GenBank on July 14, 2006 were included in this assembly. However, not all of these ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences.

To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP3 1 program. The parameters used for CAP... [more]

Resource Blocks: 
The Malus ESTs used for this assembly were downloaded on June 14th, 2006

 

 EST Libraries
 Number of ESTs available  250907
 # of Species  4
 # of Libraries  97
 # of Tissues  18
 # of Development Stages  33

View detailed chart of libraries.

 Species
 Malus hybrid rootstock  320
 Malus sieboldii  1126
 Malus x domestica  245545
 Malus x domestica x Malus sieversii  3916
 

 

Homology was determined using the BLASTx algorithm for the Malus Contigs and Singlets vs. the Swiss-Prot and TrEMBL databases. Only matches with an E-value of 1.0 e-9 or better were recorded. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of TrEMBL nucleotide sequence entries not yet integrated in Swiss-Prot.

 Homology of Malus Contigs
 Number of Contigs  23868
 Number (%) of Contigs with a Match in Swiss-Prot Database
 View as HTML | Download Excel Spreadsheet | Search
 13340 (55.9%)
 Number (%) of Contigs with a Match in TrEMBL Database
 Download Excel Spreadsheet
 20564 (86.2%)

 

 Homology of Malus Singlets
 Number of Singlets  58982
 Number (%) of Singlets with a Match in Swiss-Prot Database
 Download Excel Spreadsheet
 21842 (37.0%)
 Number (%) of Singlets with a Match in TrEMBL Database
 View as HTML | Download Excel Spreadsheet | Search
 37101 (62.9%)
 

 

The GO Terms (www.geneontology.org) were determined by comparing the contigs against Swiss-Prot using BLAST. The Sprot2GO annotation file was then used to map go terms to the sequences using relevant matches (1e-9).

 

6492 Contigs have Biological Process annotation:
 

7570 Contigs have Cellular Component annotation:
 

9322 Contigs have Molecular Function annotation:
 

 

 


 

GO Term Serach

 

GO:          

 

 

 

The type and frequency of simple sequence repeats in this unigene assembly (v3) were determined using the CUGIssr.pl program. For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times.

 

 Sequence information
 Number of Sequences  250907
 Number of Sequences Having One Or More SSRs  46663
 Percentage of Sequences Having One Or More SSRs  18.6%
 Total Number of SSRs Found  58319
 Number of Motifs  657

 

Frequency of Motif Type

 

 Motif Length  Frequency  Percentage Frequency
 2bp  27350  46.9%
 3bp  21818  37.4%
 4bp  7437  12.8%
 5bp  1714  2.9%
 

 

The type and frequency of single nucleotide polymorphisms in this unigene assembly (V3) were determined using the AutoSNP software package (Savage et al., 2005).

View autoSNP output:

 

 SNP Summary
 Number of Contigs  23868
 Number of SNPs  14298
 Consensus Size  20360530 bp
 SNP Frequency  0.07/100 bp
 Total Transistions  7060
 Total Transversions  3836
 Total Indels  3402
 

 

 Contact Details
 Name  Main, Dorrie
 Lab  Department of Horticulture
 Organization  Washington State University
 Address  45 Johnson Hall, Pullman, WA 99164
 Telephone  509-335-2774
 Fax  509-335-8690
 Email  dorrie@wsu.edu

 

No publications are currently available.

Sequence Files:
Blast Result Files:
Microsatellite Files: