Chemical Disease Associations

$199 / year

This dataset from the Comparative Toxicogenomics Database (CTD) contains the relationships between chemicals and diseases. These relationships were inferred due to the fact that the chemical and the disease in some way share independent relationships with a same gene or group of genes; the inference was made through curation of research publications, the building of diagrams and statistical analysis. The dataset contains different types of standardized identifications for the chemical and the disease to provide a cross-platform compatibility making able to identify the chemical and the disease in major science databases and to locate the references for the research in which the inference was based. It also provides the inference score that allows determining the importance of the inference.

Complexity

Chemicals are among the main environmental factors that influence health and the way these can cause disease is not totally understood.   The Comparative Toxicogenomics Database (CTD) purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements. This is accomplished through transitive inference, which happens when for example a chemical and a disease share interactions with one or more genes, thus inferring that there is a relationship between the chemical and the disease linked to a process or product of the particular genes, with this information could be inferred the mechanism of action of the chemical upon the gene to produce the disease, the genes linked to the disease, the physiopathology of the disease and other inferences. “For example, if chemical A interacts with gene B, and independently gene B is associated with disease C, then chemical A is inferred
to have a relationship with disease C (via gene B).” These inferences could be given in other directions, for example, a gene and a disease could share the same group of chemicals; also the inferences could have direct evidence in which there are published research with evidence of the relationship, while other inferences don’t have direct evidence in the literature and can be used to create new testable hypothesis about the mechanism of disease, initiate new research on the relationship and potentially predict disease treatment and prevention.

The CTD datasets can be used to create a tool for input of queries to obtain inferred relationships between genes, chemicals and diseases and the significance of the inferences. To prioritize inferences CTD uses the inference score, which ranks how true is the inferred relationship; this is accomplished by a network diagram where the chemicals, genes and disease are nodes and the relationships between them (inferences) are edges (lines), then the statistical analysis takes into account the number of nodes (genes, diseases or chemicals) that interact with the node of interest (gene, disease or chemical), the number of inferences with direct evidence, and the location of the node of interest using the hypergeometric clustering coefficient and common neighbor statistics. Finally, the inferences should be ranked from higher to lower inference score, being the ones with higher score the most significant ones.

1. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]

Date Created

2004-01-20

Last Modified

2018-08-09

Version

2018-08-09

Update Frequency

Monthly

Temporal Coverage

N/A

Spatial Coverage

N/A

Source

John Snow Labs => Comparative Toxicogenomics Database

Source License URL

John Snow Labs Standard License

Source License Requirements

Publicly available and free for research application but citation is required. Permission asked for commercial uses

Source Citation

Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]

Keywords

Toxicogenomics, Gene Disease Association, Gene Chemical Pathways, Comparative Toxicogenomics Database, Relationships Between Chemicals and Diseases, Chemical and Disease Inferences, Chemical Disease Hypotheses

Other Titles

Diseases and Genes Engaged After Chemical Exposure, Diseases and Genes Affected by Chemical Exposure, Chemical Disease Association Disorder Type

Name Description Type Constraints
Chemical_NameName of the chemical associated with the disease.stringrequired : 1
Chemical_IDIdentification number of the chemical by the US National Library of Medicine’s Medical Subject Headings (MeSH). MeSH is a controlled vocabulary of thousands of biomedical terms that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8-character length. The MeSH unique identifier was changed to 10-character length after November 2013.stringrequired : 1
Cas_Registry_NumberUnique numeric identifier designated by CAS for the chemical substance. CAS registry number also serves as a reference to find information on the specific chemical. CAS is a division of the American Chemical Society (ACS); the CAS registry collects information of millions of chemical substances identified since the early 1900’s.string-
Disease_NameName of the disease associated to the chemical.stringrequired : 1
Disease_IDUnique identifier assigned to the disease by MeSH or OMIM, linked to the source record(s) for the disease. OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder. MeSH is a controlled vocabulary of thousands of biomedical terms (including diseases) that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8 character length. The MeSH unique identifier was changed to 10-character length after November 2013.stringrequired : 1
Direct_EvidenceType of evidence of the association published in scientific literature. Therapeutic association means that the research publication was based on a therapeutic approach or that the chemical was found to be a potential therapy for the disease . Marker or mechanism means that the research publication was oriented to a mechanism or a marker of the disease or that the chemical was found to intervene in the mechanism of disease development.string-
Inference_Gene_SymbolShort-form abbreviation of the name of the gene that was inferred to be linked to the association between the chemical and the disease. The approved symbols for human genes are collected in the HUGO Gene Nomenclature Committee database; each name and symbol is unique for every gene and can be applied for other species.string-
Inference_ScoreInference score. The inference score is calculated using statistics that takes into account the connectivity of the chemical with the disease, the number of genes used to make the inference of association and the connectivity of each of the genes. The higher the score the more likely the inference is true.numberlevel : Ordinal
Omim_IDIdentification number(s) for the disease on OMIM database (‘|'-delimited list). OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder.string-
PubMed_IDIdentification number(s) of text(s) published in PubMed database (‘|'-delimited list) as direct evidence of chemical/gene associated with the disease. PubMed is a US National Library of Medicine citation database that contains millions of abstracts, references and full text links of biomedical literature from different trusted sources.string-
Chemical_NameChemical_IDCas_Registry_NumberDisease_NameDisease_IDDirect_EvidenceInference_Gene_SymbolInference_ScoreOmim_IDPubMed_ID
SootD053260KuruMESH:D007729PRNP4.30245300
SootD053260PainMESH:D010146ALB5.554126124
CoalD003031FeverMESH:D005334IL25.588635092
CoalD003031PainMESH:D010146IL22.9912421473
DustD004391PainMESH:D010146CSF22.538622042
DustD004391PainMESH:D010146IL22.5312421473
GurC060188AnemiaMESH:D000740GSR7.825984971
GurC060188GoutMESH:D006073IL1B4.5326462562
GurC060188MalariaMESH:D008288TNF4.05611162
LR-90C485003PainMESH:D010146ALB4.554126124