- Protein Chemical Structure Comparison from HGNC and CHeMBL
- Protein Structure Comparison from Three Drug Databases
- Structural Proteins Comparison from Three Drug Databases
- Protein List Structure Comparison from Three Drug Databases
- Protein Levels Comparison from Three Drug Databases
- Protein Chemical Structure
- Protein Structure
- Amino Acid Structure
- Protein Molecules
- Structure of Proteins
- Structural Proteins
- Protein List
- Protein Levels
Protein Chemical Structure Comparison from Three Drug Databases
This dataset Protein Chemical Structure Comparison from Three Drug Databases is a selection of a 3-way consensus list from the paper “Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database” (2013) [Abstract]. It includes 352 proteins-in-common between the three drug databases.
Get The Data
- ResearchNon-Commercial, Share-Alike, Attribution Free Forever
- CommercialCommercial Use, Remix & Adapt, White Label Log in to download
This dataset is from a selection of attributed target lists extracted from the literature as supplementary data for other downloadable databases. The criteria for inclusion are drug target coverage for these human proteins. However, the exact definition varies between lists, as explained in the description below. This includes different terminology (e.g. “successful, “approved” or “proven”). There are also differences in primary target (~1:1 drug: protein) vs. secondary or subunit mappings (1:many).
The dataset includes utilities that can be explored; there are two that might be considered: a) following the database links and b) comparing them for intersects (protein IDs in common) and differentials (protein IDs unique to particular lists or subsets). The information extends to comparisons with lists that can be generated in the course of other studies or other published work (e.g. expression data or disease association gene candidates). Other areas that can be explored are: a) what other utilities that are found valuable and b) other recently published target lists recommended for inclusion.
The metadata descriptions from this source are minimal since context is provided either in the references and/or the download descriptions for the appropriate databases or sources. The lists are Excel sheets of UniProtKB, HGNC and ChEMBL live links.
Lists that are not UniProtKB Accessions in the first place are normalized to these (e.g. mappings of Human Gene Nomenclature Committee (HGNC) Symbols or Entrez Gene IDs (EGID) to UniProtKB). They are then filtered to human and Swiss-Prot (i.e. any TrEMBL entries are removed) and to approved drug targets if this is an option in the original list. In such cases lists that are hosted thus become transformations, rather than direct facsimiles, of the primary sources.
Given such ID cross-mappings are not perfect; absolute correctness is not guaranteed. However, International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS) Guide to PHARMACOLOGY versions are supplied in good faith and the originals are available in every case.
If readers are unfamiliar with protein list “slicing and dicing” the source recommends the following:
1. The UniProtKB interface
2. Venny (for comparing up to four lists)
3. Panther (for displaying detailed protein classifications and attributes from lists)
“ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database are resources of curated chemistry-to-protein relationships widely used in the chemogenomic arena. In this work we have extended an earlier analysis (PMID 22821596) by comparing chemistry and protein target content between 2010 and 2013. For the former, details are presented for overlaps and differences, statistics of stereochemistry as well as stereo representation and MW profiles between the four databases. For 2013 our results indicate quality improvements, major expansion, increased achiral structures and changes in MW distributions. An orthogonal comparison of chemical content with different sources inside PubChem highlights further interpretable differences. Expansion of protein content by UniProt IDs is also recorded for 2013 and Gene Ontology comparisons for human-only sets indicate differences. These emphasise the expanding complementarity of chemistry-to-protein relationships between sources, although different criteria are used for their capture.” Wiley Online Library Abstract on “Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database”.
About this Dataset
John Snow Labs; International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS) Guide to PHARMACOLOGY;
|Source License URL|
|Source License Requirements||
The Guide to PHARMACOLOGY database is licensed under the Open Data Commons Open Database License (ODbL). Its contents are licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported license.
Christopher Southan, Markus Sitzmann, Sorel Muresan, "Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database". Wiley Online Library, 11 December 2013.
Protein Chemical Structure, Protein Structure, Amino Acid Structure, Protein Molecules, Structure of Proteins, Structural Proteins, Protein List, Protein Levels, HGNC, CHeMBL
Protein Chemical Structure Comparison from HGNC and CHeMBL, Protein Structure Comparison from Three Drug Databases, Structural Proteins Comparison from Three Drug Databases, Protein List Structure Comparison from Three Drug Databases, Protein Levels Comparison from Three Drug Databases
|UniProtKB_Entry||This subsection of the ‘Entry information’ section provides a mnemonic identifier for a UniProtKB entry, but it is not a stable identifier. Each reviewed entry is assigned a unique entry name upon integration into UniProtKB/Swiss-Prot.||string||required : 1|
|Protein_Names||This subsection of the Protein Names on ‘Names and Taxonomy’ section provides an exhaustive list of all names of the protein, from commonly used to obsolete, to allow unambiguous identification of a protein. This subsection may also include information on the activity of the protein, such as a precise description of the catalytic mechanism of enzymes, or information about individual protein chains or functional domains contained within it, if pertinent.||string||required : 1|
|Cross_Reference_HGNC_ID||HGNC ID for each protein name.||string||required : 1|
|HGNC_Link||Links to the HGNC databases cross-referenced in UniProtKB Swiss-Prot > HGNC||string||required : 1|
|Cross_Reference_ChEMBL_ID||ChEMBL ID for each protein name.||string||-|
|ChEMBL_Link||Links to the ChEMBL databases cross-referenced in UniProtKB Swiss-Prot > CHeMBL||string||-|
|P01024||Complement C3||HGNC:1318, C3||http://www.genenames.org/data/hgnc_data.php?hgnc_id=1318||CHEMBL4917||https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL4917|
|P45452||Collagenase 3||HGNC:7159, MMP13||http://www.genenames.org/data/hgnc_data.php?hgnc_id=7159||CHEMBL280||https://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL280|