Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Extracting Critical Insights on Opioid Use Disorder with Healthcare NLP Models

This blog post explores how John Snow Labs’ Healthcare NLP models are revolutionizing the extraction of critical insights on opioid use disorder. By utilizing advanced Natural Language Processing (NLP) techniques, Healthcare NLP models can efficiently identify and categorize medical terminology related to opioid addiction, enhancing clinical understanding and aiding in better treatment strategies. By processing unstructured medical data sources like clinical notes, patient records, and research articles, NLP models identify patterns, trends, and correlations, enabling more effective prevention, diagnosis, and treatment strategies.

Opioid Use Disorder has emerged as a significant public health crisis, affecting millions worldwide and placing immense pressure on healthcare systems. The complexity of this epidemic demands advanced analytical tools to understand its multifaceted nature and develop effective interventions. Traditional data analysis methods often fall short due to the unstructured nature of most medical data, such as clinical notes, patient records, and research articles. Here, NLP offers a powerful solution.

John Snow Labs, renowned for its innovative contributions to healthcare data science, provides a comprehensive NLP library specifically designed to address the challenges in medical data analysis. This library leverages state-of-the-art techniques, including Named Entity Recognition (NER), assertion models, and relation extraction, to extract meaningful insights from unstructured text. By harnessing these tools, healthcare professionals can uncover critical information on Opioid Use Disorder, enabling more precise diagnosis, treatment, and prevention strategies.

This post covers using pretrained models in the Healthcare NLP library from John Snow Labs to extract entities related to opioids and their adverse effects. The importance of using John Snow Labs’ NLP models for the extraction of clinical entities from any text lies in their advanced capabilities to handle and analyze unstructured medical data, which is abundant in healthcare records. These models provide several critical advantages, like:

  • Accurate entity recognition,
  • Contextual understanding (assertion models),
  • Establishing relationship between entities,
  • Improving clinical decision making,
  • Supporting research etc.

Let us start with a short Spark NLP introduction and then discuss the details of opioid drugs analysis with some solid results.


Healthcare NLP & LLM

The Healthcare Library is a powerful component of John Snow Labs’ Spark NLP platform, designed to facilitate NLP tasks within the healthcare domain. This library provides over 2,200 pre-trained models and pipelines tailored for medical data, enabling accurate information extraction, NER for clinical and medical concepts, and text analysis capabilities. Regularly updated and built with cutting-edge algorithms, the Healthcare library aims to streamline information processing and empower healthcare professionals with deeper insights from unstructured medical data sources, such as electronic health records, clinical notes, and biomedical literature.

John Snow Labs’ GitHub repository serves as a collaborative platform where users can access open-source resources, including code samples, tutorials, and projects, to further enhance their understanding and utilization of Spark NLP and related tools.

John Snow Labs also offers periodic certification training to help users gain expertise in utilizing the Healthcare Library and other components of their NLP platform.

John Snow Labs’ demo page provides a user-friendly interface for exploring the capabilities of the library, allowing users to interactively test and visualize various functionalities and models, facilitating a deeper understanding of how these tools can be applied to real-world scenarios in healthcare and other domains.


Opioid Crisis

National Institutes of Health (NIH) defines opioids as: “Opioids are a class of drugs that include synthetic opioids such as fentanyl; pain relievers available legally by prescription, such as oxycodone (OxyContin®), hydrocodone (Vicodin®), codeine, morphine; the illegal drug heroin; and many others.”

The use of opioids, whether alone or with other substances, significantly contributes to the drug overdose crisis in the United States. In recent years, most overdose deaths have involved illicitly manufactured fentanyl and other powerful synthetic opioids, which are often mixed with other drugs and consumed unknowingly.

The infographic below highlights critical insights into drug overdose deaths in the United States. It reports that over 96,700 people die from drug overdoses annually, with opioids involved in 72% of these cases. Since 1999, nearly a million lives have been lost to drug overdoses. The map showcases the total annual overdose deaths by state, with California (6,198 deaths), Florida (6,266 deaths), Pennsylvania (4,377 deaths), Ohio (4,251 deaths), and Texas (3,136 deaths) being among the most affected. This visual representation highlights the extensive and significant effects of the opioid crisis and drug overdoses nationwide.


John Snow Labs’ opioid models are crucial in tackling the opioid crisis by providing advanced tools for analyzing unstructured healthcare data. These models accurately identify and contextualize critical information related to opioid use disorder, such as drug mentions, dosages, and patient symptoms. By mapping intricate relationships between entities and integrating data from diverse sources, these NLP models enable healthcare professionals to gain deep insights into this crisis. This facilitates better clinical decision-making, supports targeted interventions, and informs public health policies, ultimately contributing to more effective management and prevention of opioid-related issues.

NLP models leverage techniques like NER to accurately identify and classify opioid-related terms, Assertion Models to understand the context and Relation Extraction to map the intricate relationships between various entities, such as the correlation between opioid prescriptions and subsequent adverse effects or the interaction between different medications.


Named Entity Recognition

NER models are crucial in identifying and categorizing entities within text, such as drugs, symptoms, medical conditions, and patient demographics. In the context of opioid use, the model named ner_opioid can automatically recognize mentions of opioids, other (non-opioid) prescription medications, adverse reactions, diseases etc. (in total 22 entities) from vast amounts of clinical documentation.

Entities extracted by ner_opioid.

By systematically extracting and categorizing this information, NER models help build a structured dataset that forms the foundation for further analysis.

Extracting entities in a structured format improves usability and integration, enabling efficient retrieval and comprehensive analysis of patient information. It enhances consistency and standardization, supporting advanced analytical techniques and better decision-making. Overall, it transforms raw data into actionable insights, leading to improved patient care, effective research, and informed public health strategies.

Extracted data in a structured format.

John Snow Labs’ NER Visualizer provides a user-friendly interface for visualizing the results of NER models. It highlights and categorizes identified entities within the text. This tool allows users to see how the NER models extract and label entities, making it easier to understand and interpret the extracted data. The visualizer helps in validating the accuracy of the models, identifying patterns, and gaining insights from unstructured medical data, ultimately facilitating better data analysis and decision-making in healthcare.

The NerVisualizer highlights the named entities that are identified by ner_opioid and also displays their labels as decorations on top of the analyzed text.


Assertion Status Detection

While identifying entities is crucial, understanding the context in which they appear is equally important. Three different assertion models detailed in the Github Notebook add this layer of comprehension by determining the certainty, condition, and relevance of the identified entities.

These models can distinguish between confirmed diagnoses, possible conditions, family history, or negated (absent) mentions (e.g., “The patient denies opioid use”). This ensures that only relevant and accurate information is extracted, improving the overall quality of the data. This nuanced understanding allows healthcare providers to focus on the most relevant data, improving the accuracy of patient records and clinical decision-making.

The Assertion Visualizer is a special type of NerVisualizer that also displays on top of the labeled entities the assertion status that was inferred by a Healthcare NLP model.

The AssertionVisualizer not only exhibits labeled entities but also overlays the assertion status inferred by a Spark NLP model atop them.


Relation Extraction

The complexity of opioid use disorder often involves intricate relationships between various entities. Relation extraction models are designed to identify and map these interconnections within the text. By extracting relationships between entities, such as “opioid_drug” and “general_symptoms,” these models create a detailed map of the factors contributing to the disorder. This relational data is invaluable for uncovering hidden patterns and correlations that are critical for epidemiological studies and intervention strategies.

John Snow Labs’ Relation Extraction Visualizer provides a powerful tool for visualizing the relationships identified by relation extraction models. This visualizer allows users to see the connections between different entities within the text and displaying these relationships in an intuitive and interactive manner helps users to better understand complex data structures, validate the accuracy of the extracted relationships, and gain deeper insights into the underlying data. This tool is particularly useful for analyzing intricate patterns and interactions in healthcare data, supporting more informed clinical and research decisions.

The Relation Extraction Visualizer highlights the two entities engaged in a relation and displays their respective labels.


Comparison of GPT-4 and Healthcare NLP Results

Comparing the performance of large language models (LLMs) to specialized healthcare NLP NER models in extracting named entities from healthcare texts reveals notable differences. LLMs, such as GPT-4, are highly versatile and trained on vast datasets encompassing diverse topics, providing broad language understanding and contextual insight. However, healthcare NLP NER models are fine-tuned on domain-specific data, giving them a precise edge in identifying medical terminology, patient information, and clinical entities. While LLMs can effectively recognize general named entities and provide contextually rich responses, healthcare NLP NER models typically offer higher accuracy and reliability in extracting specific medical entities, making them more suitable for healthcare applications where precision is critical.

A medical doctor from the John Snow Labs made a blind comparison of the results of the two models — GPT-4 and the ner_opioid model of the Healthcare NLP in extracting opioid-related entities from text.

The table shows that the Healthcare NLP model extracted all the related entities correctly, while GPT-4 missed 4 out of 22 entities in the text.

GPT 4 vs Healthcare NLP Results



The opioid crisis presents a formidable challenge, necessitating sophisticated tools and approaches to uncover and understand its many facets. Healthcare NLP models, particularly those developed by John Snow Labs, offer a transformative solution by effectively processing and analyzing vast amounts of unstructured medical data. Through advanced techniques such as Named Entity Recognition (NER) and assertion status detection, these models provide detailed, accurate, and contextual insights into opioid use disorder. NER models identify and categorize key entities like drug names, dosages, symptoms, and diagnoses with very high accuracy, while assertion status models clarify the context and certainty of these mentions. This combination ensures that healthcare professionals have access to precise and relevant information, enabling better clinical decision-making, targeted interventions, and informed public health strategies.

Apparently, LLMs excel at broad entity recognition and contextual understanding, but specialized healthcare NER models typically achieve greater precision in identifying specific medical terms. This higher accuracy makes Healthcare NER models preferable for medical applications where exact entity extraction is crucial.

By harnessing the power of these NLP tools, we can gain a deeper understanding of the opioid epidemic and develop more effective approaches to combat this pressing public health issue.

Try Healthcare NLP

See in action

Fast, Cheap, Scalable: Open-Source LLM Inference with Spark NLP

Learn how the open-source Spark NLP library provides optimized and scalable LLM inference for high-volume text and image processing pipelines. This session dives into...