This dataset is a selection of both approved and research drug targets extracted from the literature, “Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds” of 2011. The criteria for inclusion are drug target coverage for approved and researched drug targets.
There are utilities that can be explored in this dataset including the two: a) following the database links and b) comparing them for intersects (protein IDs in common) and differentials (protein IDs unique to particular lists or subsets). This can be extended to comparisons with lists that may be generated in the course of studying other published work (e.g. expression data or disease association gene candidates).
The metadata descriptions are minimal since context is provided either in the references and/or the download descriptions for the appropriate databases. The lists are Excel sheets of UniProtKB, HGNC and ChEMBL live links. These entry points should be able to get to the most from other sources.
Lists that are not UniProtKB Accessions are normalized to these (e.g. mappings of Human Gene Nomenclature Committee (HGNC) Symbols or Entrez Gene IDs (EGID) to UniProtKB). They are then filtered to human and Swiss-Prot (i.e. any TrEMBL entries are removed) and to approved drug targets if this is an option in the original list. In such cases, lists that are hosted thus become transformations, rather than direct facsimiles, of the primary sources. Given such ID cross-mappings are not perfect; absolute correctness cannot be guaranteed. Versions, however, are supplied in good faith and the originals are available in every case.