Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Evidence-Based Medicine (EBM) and Data Science – Part 1: A primer to Evidence-Based Medicine

There is a continuous need for knowing the best and most recent line of treatment for every known medical condition. The cumulative experience over centuries and over different cultures yielded an enormous amount of data and researches. For a clinical decision maker, there is a need to pick out the best clinical decision among all the pile accumulated over the years. A fast and accurate search process will lead to a better and more effective healthcare service that could be more satisfactory to the patient.

If the clinician followed the right EBM methodology, this could be an auto-defense for him/her in the court against any claims (like why s/he chose this line of treatment and not that).

This blog is a trial to simplify the process of Evidence-Based Medicine to Junior Data Scientist as many of them could be of different scientific background than healthcare or maybe some of them know little about healthcare research methodologies.

This blog will be just an introduction to EBM and the most famous healthcare databases and their structure. Next parts will be published in the coming weeks to discuss the process of mining these databases.

Introducing Evidence-Based Medicine

1.1 What is EBM?

Evidence-Based Medicine is choosing the best and most UpToDate clinical decision using systematic approval. The concept was first introduced at McMaster University by David Sackett in the early 1990s.

1.2 Steps to acquire the best evidence

1.2.1 Formulate your PICOT question

The PICOT process (population, intervention, comparison, outcome, and time) is a methodology for designing a proper search strategy. Setting a research question in the PICOT format is considered an evidence-based approach to track down the literature.

1.2.2 Track down the literature

This step entails setting a search strategy including specific keywords, inclusion and exclusion criteria. The search starts then using these keywords, and according to the inclusion criteria and excluding the results that comply with the exclusion criteria.

1.2.3 Search the Healthcare Research Databases

They are collections of journals, magazine articles, dissertations, systematic reviews, and abstracts. Such collections are acquired by the library. The contents are reviewed and organized.

1. Medline/PubMed: This database created and maintained by the United States National Library of Medicine (NLM) of the National Institutes of Health. It includes biomedical literature from 1966 onward. It is concerned with fields of (medicine, dentistry, nursing, veterinary medicine, health care services, and the preclinical sciences). Pubmed is a free access server that provides access to over 11 million Medline citations. It includes publications published in 40 different languages.

The US National Library of Medicine offers a wonderful Nomenclature of medical terms called “MeSH”.

The MeSH database is available for download here.

The recent ASCII MeSH download file is: d2019.bin

The record for a MeSH term contains:

  • A definition of the term.
  • Associated subheadings.
  • A list of entry terms.

You can use a python script to create a SQLite database for MeSH. Note that, a single MeSH term may have more than one MeSH code. We will explain the MeSH database further and working with them in the coming blogs in the next weeks.

John Snow Labs catalogs have more than 1775 normalized health datasets, most of them are freshly curated and machine and manually validated.

Among these datasets, you can find a very valuable dataset that includes the “MEDLINE PubMed Journal Citation Database”. This dataset contains NLM’s database of citations and abstracts in the fields of medicine, nursing, dentistry, veterinary medicine, health care systems, and preclinical sciences.

2. Embase

Embase supports the EBM methodology by providing a search form that allows the user to formulate the search using the 4 PICO elements. Embase provides literature from 1947 to the present, including 32 million records (including MEDLINE titles). Moreover, it includes publications from over than 8,500 journals from over 95 countries. Journals that are unique to Embase are 2,900 journals.

The Emtree thesaurus is a hierarchy for controlled vocabulary for biomedicine and the related life sciences. It is is used to index all contents of the Embase database content.

Emtree terms and their synonyms are used in the search query to enhance the outcome of the so-called “PICO-based search”.

3. Ovid

The OvidOpenAccess provides more than 70,000 journal articles and abstracts from more than 200 peer-reviewed journals published by Medknow Publications.

Ovid uses mapping and subheadings where the user can choose to explore the keyword to include all results using the stated term and all its related terms.

4. The Cochrane Collaboration

Cochrane objectives and strategies focus on providing support for a better healthcare decision through maintaining an up-to-date systematic review of randomized controlled trials of healthcare and provide online access to them.

Cochrane comprises 3 databases within itself. This can be explained as follows:

  • The Cochrane Database of Systematic Reviews.
  • The Cochrane Controlled Trials Register (contains about 300,000 controlled trials).
  • The Database of Abstracts of Reviews of Effectiveness (DARE).

5. Other databases and tools

Medline, Ovid, Embase, and Cochrane are not the only healthcare databases. There are PsycINFO, ProQuest, CINAHL, and Google Scholars.

Other traditional search engines (like Google, Bing, and Yahoo) may be involved in the search process as well.

Please Remember this for the coming blogs:

Emtree: used for full-text indexing of all journal articles in Embase.

MeSH: used to index articles for MEDLINE.

Relying on one database is not enough. Among all the known published control trials, only (30% -80%) were identified after mining the MEDLINE database. Researchers agreed that at least 2 databases should be included in any search strategy.

1.2.3 Appraise the results

Your search results are not guaranteed. The quality of these studies might be questionable. Systematic-Reviews that include meta-analysis is placed on the top of the “Evidence Pyramid”. They are considered the most trustable. Systematic reviews (only) can be ranked second. RCTs are ranked the third while Case Studies represents the bottom of the Evidence Pyramid.

So, Results from Cochrane Reviews can be a trustable source.

The assessment process of the clinical trials studies is called “Critical Appraisal”.

Critical Appraisal depends on assessing the “Internal Validity” of the study. This can be done by considering the following inquiries:

  • Were all the groups well-represented and compared?
  • Were the study results accurate and scalable?
  • Was there a placebo effect?

The outcome of the study must be checked whether it happened by chance or not and how much was the effect.

1.2.4 Apply the results (the evidence)

After assessing the “Internal Validity”, here comes the role of “External Validity”. You must compare your patients with the patients in the study, ask yourself whether this intervention can be applied in your facility or not, and finally try to search for alternatives if the intervention is not applicable in your facility due to a specific reason.

1.2.5 Measure the effectiveness and performance of the process

You must monitor and record the whole process starting from setting your research question until implementing the suggested guidelines based on your search results.

Next parts will be published in the coming weeks to discuss the process of mining these databases using data science tools.

Mining the Surveillance, Epidemiology, and End Results (SEER) Registries Case Study: Oral Malignant Melanoma (OMM)

Getting to Know SEER The Surveillance, Epidemiology, and End Results (SEER) is a Program of the National Cancer Institute (NCI). It provides...