Gene Naming Guideline

Open Access PublicationDetailed GuidelineGene Class Symbol | Species Prefix

Gene Naming Guidlines

Genes in GDR are composed of two types:

  1. Genes with gene symbol, assigned by individual investigators, usually with known or predicted function or phenotype.
  2. ​Genes predicted from the whole genome assembly

1. Naming Guideline for Genes with known or predicted function or phenotype

(for gene data submission and gene nomenclature in manuscript)

The guideline has been put together by Rosaceae Gene Name Standardization Subcommittee of RosEXEC/RosIGI.

Gene Symbol is composed of:

[species prefix][3-letter code class symbol].[numeric suffix for a gene][-|_][numeric suffix for an allele or a splice variant]

Where

  • [species prefix]: publication purposes - three letter prefix for major species and 5 letter prefix for others
  • [3-letter code class symbol]: a 3 letter class symbol for closely related genes
  • [numeric suffix for a gene]: numeric suffix for a gene. the gene symbols do not contain a number, a numeric suffix can be directly attached without the proceeding period.
  • [-|_]: publication purposes a hypen for alleles  and an underscore for splice variants
  • [numeric suffix for an allele or a splice variant]

Examples:

  • When the gene symbols do not contain a number: PG1, PG2
  • When the gene symbols do contain a number: DHN3.1, DHN3.2
  • alleles: DHN3.1-1, DHN3.1-2
  • splice variants: DHN3.1_1, DHN3.1_2
  • to compare genes from multiple species: PpeDHN3.1, MdoDHN3.1

Please refer to the page for detailed guideline.

 

2. Naming Guideline for Predicted Genes from Whole Genome Assembly

The guideline closely follows the nomenclature used in other plant communities such as Arabidopsis and Rice.
 

Genes

[prefix].[*|m|p][2 digit chromsome #]g[6 digit id increments by 10]

Where

  • [prefix]: a 3-5 unique identifier that references the project or species.
  • [*|m|p]: * mean no character and is used for nuclear genes, m is used for mitocondrial genes and p is used for plastid genes
  • [2 digit chromosome #]: optional if sequences are contigs or scaffolds. 
  • g is for gene
  • [6 digit id increments by 10]: every gene must have a unique number regardless of the chromosome.

Examples with chromosomes numbers:

nuclear gene: Fv.01g000010
mitchondrial gene: Fv.m01g000010
plastid gene: Fv.p01g000010
 

Examples without chromsome numbers:

nuclear gene: Fv.g000010
mitchondrial gene: Fv.mg000010
plastid gene: Fv.pg000010

mRNA 

[prefix].[*|m|p][2 digit chromsome #]g[6 digit id increments by 10].m[2 digit variant]

Where

  • [prefix]: a 3-5 unique identifier that references the project or species.
  • [*|m|p]: * mean no character and is used for nuclear genes, m is used for mitocondrial genes and p is used for plastid genes
  • [2 digit chromosome #]: optional if sequences are contigs or scaffolds. 
  • g is for gene
  • [6 digit id increments by 10]: every gene must have a unique number regardless of the chromosome.
  • m is for mRNA
  • [2 digit variant]: a unique 2 digit number for the splice variants within the gene model.

Examples with chromosomes numbers:

nuclear gene: Fv.01g000010.m01

 

Examples without chromsome numbers:

nuclear gene: Fv.g000010.m01