Chemical Gene Ontology Enriched Associations

$447.50 / year

This dataset contains the results of Gene Ontology (GO) enrichment analyses performed for groups of genes that are in some way affected by a chemical. This analysis was done using the tool GO-TermFinder resulting in GO terms shared between the genes, creating information used to inferences in the biological processes, molecular functions or cellular components that might be involved in the effect of the chemical over the genes and/or the mechanism of disease.


The Gene Ontology terms, developed by the Gene Ontology Consortium, is a system that allows to verbally describe the functions and processes of genes once these are identified, by annotating the genes to a series of standardized terms and vice versa. This system also permits to find common characteristics (common denominators) between genes by performing a Gene Ontology Enrichment analysis.

Genome annotation is the description of a gene correlated with a structure, function, products, regulation and other enriching information. “The “unit” of genome annotation is the description of an individual gene and its protein (or RNA) product, and the focal point of each such record is the function assigned to the gene product. The record may also include a brief description of the evidence for this assigned function.”

Gene Ontology (GO) terms are used as an aid for annotation of a gene allowing standardized descriptions by annotating GO terms to the genes (or vice versa) in the Gene Ontology Database. This method makes possible to search genes that share common characteristics (described with the GO terms) and run objective analyses around these characteristics.

Gene enrichment is a statistical analysis to identify over-represented (over expressed) genes from a large pool of genes or proteins that share a common characteristic (i.e. microarray); these over-represented genes could be associated with the disease. Enrichement is statistically Comparing GO terms to the genes set to understand the biological processes of those genes. The test determines if the GO term is enriched for the genes.

GO-TermFinder is a tool used to find gene ontology information and analyze the annotation of the GO terms to a microarray (a large group of genes) calculating the statistical significance of each annotation to find significantly enriched gene ontology terms associated to the list of genes. By performing this analysis scientists can determine if the genes have a share a common characteristic by finding the terms that are more strongly associated (by annotation significance) to the group of genes. GO-TermFinder uses the hypergeometric distribution statistical formula to find the significance of the annotated GO terms.

1. Koonin EV, Galperin MY. Sequence – Evolution – Function: Computational Approaches in Comparative Genomics. Boston: Kluwer
Academic; 2003.

Date Created


Last Modified




Update Frequency


Temporal Coverage


Spatial Coverage



John Snow Labs; Comparative Toxicogenomics Database;

Source License URL

Source License Requirements

Publicly available and free for research application but citation is required. Permission asked for commercial uses

Source Citation

Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]


Toxicogenomics, Gene Disease Association, Gene Chemical Pathways, Gene Enrichment, Gene Ontology Enrichment Analyses, Gene Ontology Terms, Comparative Toxicogenomics Database, Relationships Between Chemicals and Diseases, Chemical and Disease Inferences, Chemical Disease Hypotheses

Other Titles

Chemical Effect Integrated into GO Pathways, Go Pathways Response to Chemical Exposure, GO Pathway and Genes Affected Upon Chemical Exposure, Chemical GO Enriched Associations Gene Analysis, Chemical GO Enriched

Chemical_NameName of the chemical associated with the group of genes analyzed for Gene Ontology enrichment.stringrequired : 1
Chemical_IDIdentification number of the chemical by the US National Library of Medicine’s Medical Subject Headings (MeSH). MeSH is a controlled vocabulary of thousands of biomedical terms that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8-character length. The MeSH unique identifier was changed to 10-character length after November 2013.stringrequired : 1
Cas_Registry_NameUnique numeric identifier designated by CAS for the chemical substance. CAS registry number also serves as a reference to find information on the specific chemical. CAS is a division of the American Chemical Society (ACS); the CAS registry collects information of millions of chemical substances identified since the early 1900’s.string-
Gene_OntologyGene ontology (GO). Gene ontology is a controlled vocabulary used to describe or define gene function and properties collecting concepts and the relationships between these concepts. Gene functions by the GO are divided in three main classes: molecular function, cellular component and biological process. Molecular function = names for activities performed at a molecular level by individual products of the gene; these are often appended with the word “activity” to avoid confusion with gene product names (e.g.: adenylate cyclase activity). Cellular component= names for structures inside the cell or structures at molecular level formed by groups of gene products (formed by groups of proteins). Biological process= name for a series of steps that lead to a biological change, these include pathways and other processes in which the activities of multiple gene products intervene.stringrequired : 1
GO_Term_NameGene ontology term that describes the molecular function, biological process or cellular component.stringrequired : 1
GO_Term_IDAlphanumerical Identification for GO terms. The GO term ID is used to browse the GO terms in the Gene Ontology database. The GO database is a relational database comprised of the GO ontologies as well as the annotations of genes and gene products to terms in those ontologies. The GO database is the source of all data available through the legacy AmiGO 1.8 browser and search engine.stringrequired : 1
Highest_GO_LevelThe highest level to which the GO term is assigned within the GO hierarchical ontology. Many GO terms are located at multiple levels within the ontology; only the highest level is displayed. Level 1 constitutes “children” of the most general Biological Process, Cellular Component, and Molecular Function terms. The structure of GO can be described in terms of a graph, where each GO term is a node, and the relationships between the terms are edges between the nodes. GO is loosely hierarchical, with 'child' terms being more specialized than their 'parent' terms, but unlike a strict hierarchy, a term may have more than one parent term (note that the parent/child model does not hold true for all types of relation). For example, the biological process term “hexose biosynthetic process” has two parents, “hexose metabolic process” and “monosaccharide biosynthetic process.” This is because “biosynthetic process” is a subtype of “metabolic process” and a “hexose” is a subtype of “monosaccharide.”integerlevel : Ordinal
P_ValueRaw p-value. The p-value indicates how significant is the GO term to the group of genes related to the chemical; the closer to zero, the greater the probability that the GO term is shared by these genes due to reasons other than by chance. The p-value is calculated using hypergeometric distribution method, which compares the GO terms shared by the genes with the background distribution of the annotation; the components of the formula are the variables Target Match Quantity, Target Total Quantity, Background Match Quantity and Background Total Quantity.string-
Corrected_P_ValueP-value after applying Bonferroni adjustment. Bonferroni correction is made when a group of variables (in this case a group of genes) is being tested, to give a more accurate significance.string-
Target_Match_QuantityNumber of genes that interact with the chemical and are annotated to the GO term.integerlevel : Ratio
Target_Total_QuantityTotal number of genes that interact with the chemical.integerlevel : Ratio
Background_Match_QuantityTotal number of genes that are annotated to the GO term.integerlevel : Ratio
Background_Total_QuantityTotal number of human genes.numberlevel : Ratio
BP 897C120565Biological ProcesssleepGO:003043124.14e-104.35e-07333342937
TeaD013662Biological ProcessagingGO:000756823.03e-161.29e-12124629742937
FK 706C109799Biological ProcesstaxisGO:004233028.40e-071.37e-034754042937
ginsanC111520Biological ProcesstaxisGO:004233021.68e-082.64e-055854042937
GW2974C506645Biological ProcesstaxisGO:004233021.98e-062.03e-033354042937
HU 211C062018Cellular ComponentaxonGO:003042443.32e-111.17e-0792852742937
KB 5-21C055830Biological ProcesssleepGO:003043125.73e-072.38e-04223342937
LR-90C485003Biological ProcessagingGO:000756822.91e-088.23e-0551429742937
LR-90C485003Biological ProcesstaxisGO:004233021.06e-083.00e-0561454042937
NF023C105374Cellular ComponentcaveolaGO:000590143.17e-062.78e-03227742937