Gene Expression Vocabulary

$179 / year

This dataset contains the terms of the vocabulary used in the Comparative Toxicogenomics Database (CTD) to describe the activity of genes inferred to have an interaction with a chemical or disease. The dataset contains different types of standardized identifications for the gene to provide a cross-platform compatibility making able to identify the gene and its characteristics in major scientific databases.


The Comparative Toxicogenomics Database (CTD) purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements.

The CTD datasets can be used to create a tool for input of queries to obtain inferred relationships between genes, chemicals and diseases and the significance of the inferences. When a query is run, the terms on this dataset are displayed in the output generated for genes related to a chemical.

Date Created


Last Modified




Update Frequency


Temporal Coverage


Spatial Coverage



John Snow Labs; Comparative Toxicogenomics Database;

Source License URL

Source License Requirements

Publicly available and free for research application but citation is required. Permission asked for commercial uses

Source Citation

Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]


Taxogenomics, Gene Disease Association, Gene Chemical Pathways, Activity of Genes Vocabulary, Mechanism of Chemicals, Gene and Disease Relationship, Comparative Toxicogenomics Database, Chemical and Disease Inferences, Gene Nomenclature, Gene Definitions

Other Titles

Comparative Toxicogenomics Terms of the Vocabulary, Comparative Toxicogenomics Database Gene Expression Vocabulary

Gene_SymbolShort-form abbreviation of the name of the gene interacting with the chemical. The approved symbols for human genes are collected in the HUGO Gene Nomenclature Committee database; each name and symbol is unique for every gene and can be applied for other species.string-
Gene_NameName of the gene.string-
Gene_IDUnique identifier for the gene of the National Center for Biotechnology Information (NCBI)’s Entrez Gene database. This Entrez Gene unique integer can be browsed in the Entrez system online to find nomenclature, sequence, products and other specific details of the gene. The identifier is species specific, a gene ID of a human gene can’t be applied to the same gene of a different species.integerlevel : Nominal
Alternative_Gene_IDAlternative NCBI Gene identifiers; ('|'-delimited list).string-
SynonymsOther names for gene. ('|'-delimited list)string-
Bio_Grid_IDIdentification number of the gene in the BioGRID database. The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans. BioGRID currently holds over 1,400,000 interactions curated from both high-throughput datasets and individual focused studies, as derived from over 57,000 publications in the primary literature. ('|'-delimited list)string-
PharmGKB_IDsIdentification number of the gene in the PharmGKB database. The PharmGKB is a pharmacogenomics knowledge resource that encompasses clinical information including dosing guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype-phenotype relationships. PharmGKB collects, curates and disseminates knowledge about the impact of human genetic variation on drug responses. ('|'-delimited list)string-
UniProt_IDsIdentification number of the gene in the UniProt database. The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and gene annotation data. ('|'-delimited list)string-
Gene SymbolGene NameGene IDAlternative Gene IDSynonymsBio Grid IDPharmGKB IDsUniProt IDs
03B03FDNA segment, 03B03F (Research Genetics)27777.0
03B03RDNA segment, 03B03R (Research Genetics)27778.0
03.MMHAP34FRA.SEQDNA segment, 03.MMHAP34FRA.seq53288.0
102G4T7DNA segment, 102g4T756573.0
106I22-SP6DNA segment, 106I22-Sp653159.0
109F12RDNA segment, 102F12R56574.0
10A1.1.Sretrotransposon-like element 10A1 gene 1 S homeolog379519.010a1.1|Xretpos
10RIBBFTBackfat at tenth rib407484.0100322921|100326046|100326095|100326256|100328128|100529885|100529886|100530249|100530360|100530450|100884666|100884812|100884853|100884926|407504|40759610THRIB|Backfat thickness at 10th rib
10SDNA segment, 10S27451.0