Protein Chemical Structure Comparison from Three Drug Databases

$79 / year

This dataset Protein Chemical Structure Comparison from Three Drug Databases is a selection of a 3-way consensus list from the paper “Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database” (2013) [Abstract]. It includes 352 proteins-in-common between the three drug databases.

Complexity

This dataset is from a selection of attributed target lists extracted from the literature as supplementary data for other downloadable databases. The criteria for inclusion are drug target coverage for these human proteins. However, the exact definition varies between lists, as explained in the description below. This includes different terminology (e.g. “successful, “approved” or “proven”). There are also differences in primary target (~1:1 drug: protein) vs. secondary or subunit mappings (1:many).

The dataset includes utilities that can be explored; there are two that might be considered: a) following the database links and b) comparing them for intersects (protein IDs in common) and differentials (protein IDs unique to particular lists or subsets). The information extends to comparisons with lists that can be generated in the course of other studies or other published work (e.g. expression data or disease association gene candidates). Other areas that can be explored are: a) what other utilities that are found valuable and b) other recently published target lists recommended for inclusion.

The metadata descriptions from this source are minimal since context is provided either in the references and/or the download descriptions for the appropriate databases or sources. The lists are Excel sheets of UniProtKB, HGNC and ChEMBL live links.

Lists that are not UniProtKB Accessions in the first place are normalized to these (e.g. mappings of Human Gene Nomenclature Committee (HGNC) Symbols or Entrez Gene IDs (EGID) to UniProtKB). They are then filtered to human and Swiss-Prot (i.e. any TrEMBL entries are removed) and to approved drug targets if this is an option in the original list. In such cases lists that are hosted thus become transformations, rather than direct facsimiles, of the primary sources.

Given such ID cross-mappings are not perfect; absolute correctness is not guaranteed. However, International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS) Guide to PHARMACOLOGY versions are supplied in good faith and the originals are available in every case.

If readers are unfamiliar with protein list “slicing and dicing” the source recommends the following:

1. The UniProtKB interface
2. Venny (for comparing up to four lists)
3. Panther (for displaying detailed protein classifications and attributes from lists)

“ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database are resources of curated chemistry-to-protein relationships widely used in the chemogenomic arena. In this work we have extended an earlier analysis (PMID 22821596) by comparing chemistry and protein target content between 2010 and 2013. For the former, details are presented for overlaps and differences, statistics of stereochemistry as well as stereo representation and MW profiles between the four databases. For 2013 our results indicate quality improvements, major expansion, increased achiral structures and changes in MW distributions. An orthogonal comparison of chemical content with different sources inside PubChem highlights further interpretable differences. Expansion of protein content by UniProt IDs is also recorded for 2013 and Gene Ontology comparisons for human-only sets indicate differences. These emphasise the expanding complementarity of chemistry-to-protein relationships between sources, although different criteria are used for their capture.” Wiley Online Library Abstract on “Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database”.

Date Created

2013-12-11

Last Modified

2017-01-27

Version

2017-01-27

Update Frequency

Irregular

Temporal Coverage

N/A

Spatial Coverage

N/A

Source

John Snow Labs => International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS) Guide to PHARMACOLOGY

Source License URL

John Snow Labs Standard License

Source License Requirements

The Guide to PHARMACOLOGY database is licensed under the Open Data Commons Open Database License (ODbL). Its contents are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

Source Citation

Christopher Southan, Markus Sitzmann, Sorel Muresan, "Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database". Wiley Online Library, 11 December 2013.

Keywords

Protein Chemical Structure, Protein Structure, Amino Acid Structure, Protein Molecules, Structure of Proteins, Structural Proteins, Protein List, Protein Levels, HGNC, CHeMBL

Other Titles

Protein Chemical Structure Comparison from HGNC and CHeMBL, Protein Structure Comparison from Three Drug Databases, Structural Proteins Comparison from Three Drug Databases, Protein List Structure Comparison from Three Drug Databases, Protein Levels Comparison from Three Drug Databases

NameDescriptionTypeConstraints
UniProtKB_EntryThis subsection of the ‘Entry information’ section provides a mnemonic identifier for a UniProtKB entry, but it is not a stable identifier. Each reviewed entry is assigned a unique entry name upon integration into UniProtKB/Swiss-Prot.stringrequired : 1
Protein_NamesThis subsection of the Protein Names on ‘Names and Taxonomy’ section provides an exhaustive list of all names of the protein, from commonly used to obsolete, to allow unambiguous identification of a protein. This subsection may also include information on the activity of the protein, such as a precise description of the catalytic mechanism of enzymes, or information about individual protein chains or functional domains contained within it, if pertinent.stringrequired : 1
Cross_Reference_HGNC_IDHGNC ID for each protein name.stringrequired : 1
HGNC_LinkLinks to the HGNC databases cross-referenced in UniProtKB Swiss-Prot > HGNCstringrequired : 1
Cross_Reference_ChEMBL_IDChEMBL ID for each protein name.string-
ChEMBL_LinkLinks to the ChEMBL databases cross-referenced in UniProtKB Swiss-Prot > CHeMBLstring-
UniProtKB_EntryProtein_NamesCross_Reference_HGNC_IDHGNC_LinkCross_Reference_ChEMBL_IDChEMBL_Link
P00797ReninHGNC:9958, RENhttp://www.genenames.org/data/hgnc_data.php?hgnc_id=9958CHEMBL286https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL286
P00734ProthrombinHGNC:3535, F2http://www.genenames.org/data/hgnc_data.php?hgnc_id=3535CHEMBL204https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL204
P08473NeprilysinHGNC:7154, MMEhttp://www.genenames.org/data/hgnc_data.php?hgnc_id=7154CHEMBL1944https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL1944
P00747PlasminogenHGNC:9071, PLGhttp://www.genenames.org/data/hgnc_data.php?hgnc_id=9071CHEMBL1801https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL1801
P09237MatrilysinHGNC:7174, MMP7http://www.genenames.org/data/hgnc_data.php?hgnc_id=7174CHEMBL4073https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL4073
P01024Complement C3HGNC:1318, C3http://www.genenames.org/data/hgnc_data.php?hgnc_id=1318CHEMBL4917https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL4917
P08254Stromelysin-1HGNC:7173, MMP3http://www.genenames.org/data/hgnc_data.php?hgnc_id=7173CHEMBL283https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL283
P14679TyrosinaseHGNC:12442, TYRhttp://www.genenames.org/data/hgnc_data.php?hgnc_id=12442CHEMBL1973https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL1973
P06276CholinesteraseHGNC:983, BCHEhttp://www.genenames.org/data/hgnc_data.php?hgnc_id=983CHEMBL1914https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL1914
P45452Collagenase 3HGNC:7159, MMP13http://www.genenames.org/data/hgnc_data.php?hgnc_id=7159CHEMBL280https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL280