Apple Genome v1.0 Frequently Asked Questions
Table of Contents
- Why are there no sequences for each chromosome?
- What is the easiest way to get a sequence for a few contigs?
- How do I open the files for download on the apple genome project page?
- What is the difference between the consensus gene set and alternate gene set?
The Malus x domestica genome is heterozygous and during assembly reads from various haplotypes assembled together into different overlapping contigs. Rather than pick-and-choose one haplotype over another to form a consensus sequence the contigs have been organized into clusters (metacontigs) which in turn have been aligned to the chromosome using markers from the genetic map. Therefore, providing the genome in terms of ordered contigs preserves the haplotype information inherent in the sequences. More information about the assembly process can be found in the whole genome publication and its supplemental material.
A complete FASTA file of contig sequences can be obtained from the Apple Genome v1.0 Page here on GDR. The contigs of interest can be extract from this file. However, GBrowse may be used to extract a few sequences one at a time. To use GBrowse to obtain a sequence, first search for the contig of interest using the Search box labeled as 'Landmark or Region'. Next, click the contig. A new page or tab will open containing a small description for the contig follwed by the FASTA formatted sequence.
After downloading and uncompressing the files available on the apple genome project page, the files have various extensions such as '.nuc', '.gff' and others. All of these files, with the exception of the Excel files are plain text and can be opened by any text editor. Due to differences in the way UNIX and Windows operating systems store plain text, the files may not display properly using the 'notepad' program in Windows. However, any other text editor, in Windows including WordPad should open the files. Files with extension '.nuc' and '.pep' are FASTA files with nucelotide and protein sequences respectively. Files with a 'gff' or 'gff3' extension are in GFF format.
As explained in the 'Gene' sub-page of the genome page, gene models were predicted on the assembly contigs using various gene prediction tools such as Glimmer, FGENESH, Genewise, GMAP and Twinscan. A final consensus gene set was constructed using evidence derived from all of these gene prediction tools. Alternate genes sets include all the genes from the individual gene prediction tools. The supplemental data for the Apple Genome publication provides a better descriptions for how gene models were constructed.