Chemical Gene Interactions

$179 / year

This dataset is a list of research publications, curated in the Comparative Toxicogenomics Database (CTD), that show evidence of interaction between a chemical and a gene. The data collect the type of interaction, the degree of such and what element of the gene is affected. The studies listed were performed in human beings and in other species.


This dataset contains different types of standardized identifications for the chemical and the gene to provide a cross-platform compatibility making able to identify the chemical and the gene in major science databases and to locate the references for the research.

The Comparative Toxicogenomics Database (CTD) datasets can be used to create a tool for input of queries to obtain inferred relationships between genes, chemicals and diseases and the significance of the inferences. The information given on this dataset (on research publications) is part of the output resulting when looking for inferences that have direct evidence.

Chemicals are among the main environmental factors that influence health and the way these can cause disease is not totally understood. The Comparative Toxicogenomics Database purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements. This is accomplished through transitive inference, which happens when for example a chemical and a disease share interactions with one or more genes, thus inferring that there is a relationship between the chemical and the disease linked to a process or product of the particular genes, with this information could be inferred the mechanism of action of the chemical upon the gene to produce the disease, the genes linked to the disease, the physiopathology of the disease and other inferences. These inferences could be given in other directions, for example, a gene and a disease could share the same group of chemicals; also the inferences could have direct evidence in which there are published research with evidence of the relationship, while other inferences don’t have direct evidence in the literature and can be used to create new testable hypothesis about the mechanism of disease, initiate new research on the relationship and potentially predict disease treatment and prevention.

Date Created


Last Modified




Update Frequency


Temporal Coverage


Spatial Coverage



John Snow Labs; Comparative Toxicogenomics Database;

Source License URL

Source License Requirements

Publicly available and free for research applications; citation is required. Permission asked for commercial uses

Source Citation

Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]


Toxicogenomics, Gene Disease Association, Gene Chemical Pathways, Comparative Toxicogenomics Database, Relationships Between Chemicals and Diseases, Chemical and Disease Inferences, Chemical Disease Hypotheses

Other Titles

Genes Responding to External Chemical Cues, Interaction Actions of Chemicals in Biological Process, Interaction Between Genes and Chemical Exposure, Chemical Gene Interactions Disorder Type, Chemical Gene Interactions Disorder Men, Chemical Gene Interactions Symptoms in Women

Chemical_NameName of the chemical associated to the gene.stringrequired : 1
Chemical_IDIdentification number of the chemical by the US National Library of Medicine’s Medical Subject Headings (MeSH). MeSH is a controlled vocabulary of thousands of biomedical terms that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8-character length. The MeSH unique identifier was changed to 10-character length after November 2013.stringrequired : 1
Cas_Registry_NumberCAS Registry Number (if available). Unique numeric identifier designated by CAS for the chemical substance. CAS registry number also serves as a reference to find information on the specific chemical. CAS is a division of the American Chemical Society (ACS); the CAS registry collects information of millions of chemical substances identified since the early 1900’s.string-
Gene_SymbolShort-form abbreviation of the name of the gene interacting with the chemical. The approved symbols for human genes are collected in the HUGO Gene Nomenclature Committee database; each name and symbol is unique for every gene and can be applied for other species.string-
Gene_IDUnique identifier for the gene of the National Center for Biotechnology Information (NCBI)’s Entrez Gene database. This Entrez Gene unique integer can be browsed in the Entrez system online to find nomenclature, sequence, products and other specific details of the gene. The identifier is species specific, a gene ID of a human gene can’t be applied to the same gene of a different species.integerlevel : Nominal required : 1
Gene_FormsState of the gene or region of the gene affected by the chemical ('|'-delimited list). A gene is a localized section of DNA, the gene DNA consists of different regions that regulate the expression of the gene, which are: enhancer/silencer, promoter, exon and intron. Through a process called transcription this DNA is copied into a messenger RNA (mRNA). The mRNA is a copy of the gene DNA sequence; in order to function, the mRNA contains the following regions in this order: 5’UTR, protein coding region, 3’UTR, and Poly-A tail. mRNA is later translated to a protein. The DNA sequence is built with nucleotides; when there is a change in one of these nucleotides it’s called single-nucleotide polymorphism (SNP). When there is a change in a group of three nucleotides creating variants of one gene, it's called polymorphism, a gene is polymorphic when there are different variants of that gene. A mutation is any alteration of the gene due to an error during the formation of DNA or a damage to the DNA.string-
OrganismScientific name of the organism or species that carries the gene in which the interaction was described.string-
Organism_IDTaxonomy identification number (taxid) of the organism or species that carries the gene in which the interaction was described as per the National Center for Biotechnology Information (NCBI)’s Entrez Taxonomy database.integerlevel : Nominal
InteractionDescription of interaction (reaction) between the chemical and the gene. Describes how the chemical affects the gene, a protein product of the gene or a mechanism of action.stringrequired : 1
Interaction_ActionsDegree of the interaction between the chemical and the gene ('|'-delimited list). Each chemical–gene interaction is qualified by a degree: increases, decreases, affects, or does not affect. The "affects" degree is used when the reference does not describe a more specific degree. Interactions having the "does not affect" degree were excluded from this data.stringrequired : 1
PubMed_IDIdentification number(s) of text(s) published in PubMed database (‘|'-delimited list) as direct evidence of chemical/gene association with the disease. PubMed is a US National Library of Medicine citation database that contains millions of abstracts, references and full text links of biomedical literature from different trusted sources.stringrequired : 1
Chemical NameChemical IDCas Registry NumberGene SymbolGene IDGene FormsOrganismOrganism IDInteractionInteraction ActionsPubMed ID
10074-G5C534883MAX4149protein10074-G5 affects the folding of and results in decreased activity of [MYC protein binds to MAX protein]affects^binding|affects^folding|decreases^activity26474287
10074-G5C534883MAX4149protein10074-G5 inhibits the reaction [MYC protein binds to MAX protein]affects^binding|decreases^reaction26474287
10074-G5C534883MYC4609proteinHomo sapiens9606.010074-G5 analog results in decreased expression of MYC proteindecreases^expression26036281
10074-G5C534883MYC4609proteinHomo sapiens9606.010074-G5 results in decreased activity of MYC proteindecreases^activity25716159
10074-G5C534883MYC4609proteinHomo sapiens9606.010074-G5 results in decreased expression of MYC proteindecreases^expression26036281
10074-G5C534883MYC4609protein10074-G5 affects the folding of and results in decreased activity of [MYC protein binds to MAX protein]affects^binding|affects^folding|decreases^activity26474287
10074-G5C534883MYC4609protein10074-G5 inhibits the reaction [MYC protein binds to MAX protein]affects^binding|decreases^reaction26474287
10,10-bis(4-pyridinylmethyl)-9(10H)-anthracenoneC112297FOS2353proteinMus musculus10090.010,10-bis(4-pyridinylmethyl)-9(10H)-anthracenone inhibits the reaction [Valproic Acid inhibits the reaction [Kainic Acid results in increased expression of FOS protein]]decreases^reaction|increases^expression26348896
10,10-bis(4-pyridinylmethyl)-9(10H)-anthracenoneC112297KCNQ13784protein10,10-bis(4-pyridinylmethyl)-9(10H)-anthracenone results in decreased activity of KCNQ1 proteindecreases^activity18568022
10,10-bis(4-pyridinylmethyl)-9(10H)-anthracenoneC112297KCNQ23785proteinMus musculus10090.010,10-bis(4-pyridinylmethyl)-9(10H)-anthracenone affects the reaction [[Potassium results in increased activity of KCNQ2 protein] which results in increased import of Thallium]affects^reaction|increases^activity|increases^import15634793