Genome annotation is the description of a gene correlated with a structure, function, products, regulation and other enriching information. “The “unit” of genome annotation is the description of an individual gene and its protein (or RNA) product, and the focal point of each such record is the function assigned to the gene product. The record may also include a brief description of the evidence for this assigned function.” (1) One of the elements that can be annotated to a gene is the pathway(s) of biological processes in which the gene participate or generate.
Gene enrichment is a statistical analysis to identify over-represented (overexpressed) genes from a large pool of genes or proteins that share a common characteristic (i.e. microarray); these over-represented genes could be associated with a disease. Enrichment is statistically comparing descriptions to the genes set to understand the biological processes of those genes. The test determines if the pathway is enriched for the genes calculating the statistical significance of each annotation to find significantly enriched pathways associated with the list of genes. By performing this analysis scientists can determine if the genes share a common characteristic by finding the annotated pathways that are more strongly associated (by annotation significance) to the group of genes. The hypergeometric distribution statistical formula is used to find the significance of the pathways.
1. Koonin EV, Galperin MY. Sequence – Evolution – Function: Computational Approaches in Comparative Genomics. Boston: Kluwer Academic; 2003.