Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

State-of-the-art RxNorm Code Mapping with NLP: Comparative Analysis between the tools by John Snow Labs, Amazon, and GPT-4

This blog post compares RxNorm code mapping accuracy and a price analysis between John Snow Labs, GPT-4, and Amazon Comprehend Medical.


The Power of Standardized Medication Data

RxNorm, developed by the National Library of Medicine, is a standardized naming system for clinical drugs. It assigns unique identifiers to medications, ensuring consistency and interoperability across electronic health records and healthcare systems. This facilitates clear communication, supports clinical decision-making, and enhances regulatory compliance.

Accurately mapping medications to RxNorm codes is crucial for several reasons:

  • Safer Patient Care: Precise mapping helps identify potential drug interactions and allergies, preventing adverse events for patients.
  • Improved Billing and Reimbursement: Standardized medication codes ensure accurate billing and streamline the reimbursement process for healthcare providers.
  • Enhanced Research: Consistent medication mapping allows researchers to analyze large datasets effectively, leading to a better understanding of diseases and treatment outcomes.
  • Streamlined Public Health Initiatives: Mapping facilitates efficient tracking of medication use and adverse effects, aiding public health agencies in monitoring and responding to outbreaks or drug safety concerns.

Overall, it enhances communication, patient safety, research quality, and regulatory compliance.

In unstructured clinical datasets, the initial step involves extracting medication entities before mapping them to their corresponding RxNorm codes. This task can be achieved through various methods and tools. Here, we compared the following tools for medication entity recognition and mapping RxNorm codes:

  • Healthcare NLP: A specialized library developed by John Snow Labs, built on the Apache Spark framework, offers pre-trained clinical pipelines and models for various Natural Language Processing (NLP) tasks in healthcare, including medication recognition and mapping.
  • GPT-4: This powerful large language model (LLM) is known for its ability to process and generate human-like text. While not specifically designed for healthcare, its capabilities might apply to tasks like medication mapping.
  • Amazon Comprehend Medical: This cloud-based NLP service by Amazon Web Services (AWS) is trained on medical text data. It provides functionalities like entity recognition, including medication extraction and potential linking to standard codes.


Establishing the Ground Truth and Evaluation Framework

To ensure a fair comparison of these tools, we enlisted the assistance of human annotators. Medical annotation experts from John Snow Labs utilized the Generative AI Lab to annotate 79 clinical in-house documents. They meticulously labeled medication entities and assigned their respective RxNorm codes. Subsequently, we constructed a ground truth dataset based on these annotations.

Dataset annotations on the Generative AI Lab

Ground Truth Dataframe


John Snow Labs RxNorm Resolver Models

John Snow Labs offers a comprehensive suite of RxNorm Resolver models. With over 80 pretrained entity resolution models, it covers many medical terminologies including RxNorm, ICD-10-CM, SNOMED, UMLS, CPT-4, MedDRA, NDC, HPO, and more. These models are trained with various dataset sizes and types of embedding models, ensuring diversity and allowing users to select the most suitable model for their specific datasets.

Pretrained entity resolution models in John Snow Labs

If you prefer not to develop a custom entity resolution pipeline, you can utilize John Snow Labs’ pretrained pipelines with just a single line of code.

Pretrained pipelines for entity resolution in John Snow Labs

For this task, we employed two distinct RxNorm models within John Snow Labs:

To utilize these models, you must obtain a license for John Snow Labs, which you can request from here.


GPT-4 Models

We set up an OpenAI API account to obtain RxNorm mapping predictions using the GPT-4 Turbo and GPT-4o models. Remember to add credits to your account before utilizing this API.

Amazon Comprehend Medical

To use Amazon Comprehend Medical, you will need an Amazon account and will utilize your AWS credentials for accessing this service.

Results And Analysis

Let’s examine the outputs of these three tools initially.

We processed the data through both Spark NLP for Healthcare entity resolution models and obtained the predictions. Below, you can view the structure of the results, which includes the RxNorm code (code), its resolution (resolution), all the closest RxNorm codes (all_codes), resolutions of all the closest results (all_resolutions), distances of all results (all_distances), and concept classes of all results (all_k_aux_labels).

sbiobertresolve_rxnorm_augmented model output

biolordresolve_rxnorm_augmented model output

We utilized the API to obtain RxNorm predictions from GPT-4 models. As per our prompt, the model returned the RxNorm code (rxnorm_code) and its resolution (description).

GPT-4 output

We obtained predictions through the API to retrieve RxNorm codes from Amazon Comprehend Medical. The service provided the following outputs: RxNorm code (aws_code), resolution (aws_description), all closest results (all_codes), and resolutions of all closest results (all_descriptions).

Amazon Comprehend Medical service output

As observed, John Snow Labs models return up to 25 closest results, and Amazon Medical Comprehend returns up to five results, both sorted starting from the closest one. In contrast, the GPT-4 returns only one result.

Consequently, we adopted two approaches for evaluating these tools, given that the model outputs may not precisely match the annotations:

  • Top 3: Compare the annotations to see if they appear in the first three results.
  • Top 5: Compare the annotations to see if they appear in the first five results.

These approaches help assess the accuracy and performance of the tools in retrieving RxNorm codes from clinical text.

Analyzing Performance Disparities and Revealing Constraints

Based on the evaluation methodologies, these are the findings:

Top-3 Results:

Top-3 Comparison Results

Top-5 Results:

Top-5 Comparison Results

Price Analysis of the Tools

Since we don’t have such a small dataset in the real world, we calculated the price of these tools according to 1M clinical notes.

Open AI Pricing: We created a prompt to achieve better results, which costs $3.476 on GPT-4 and $1.738 GPT-4o model for the 79 documents. This means that for processing 1 million notes, the estimated cost would be $44,000 for the GPT-4 Turbo and $22,000 for the GPT-4o.

Amazon Comprehend Medical Pricing: According to the price calculator, obtaining RxNorm predictions for 1M documents, with an average of 9,700 characters per document, costs $24,250.

Healthcare NLP Pricing: When using John Snow Labs-Healthcare NLP Prepaid product on an EC2-32 CPU (c6a.8xlarge at $1,2 per hour) machine, obtaining the RxNorm codes for medications (excluding the NER stage) from approximately 80 documents takes around 2 minutes. Based on this, processing 1M documents and extracting RxNorm codes would take about 25,000 minutes (416 hours, or 18 days), costing $500 for infrastructure and $4,000 for the license (considering a 1-month license price of $7,000). Thus, the total cost for Healthcare NLP is approximately $4,500.



Based on the evaluation results:

  • The sbiobertresolve_rxnorm_augmented model of Spark NLP for Healthcare consistently provides the most accurate results in each top_k comparison.
  • The biolordresolve_rxnorm_augmented model of Spark NLP for Healthcare outperforms Amazon Comprehend Medical and GPT-4 in mapping terms to their RxNorm codes.
  • The GPT-4 could only return one result, which is reflected similarly in both charts and has proven to be the least accurate.

If you want to process 1M documents and extract RxNorm codes for medication entities (excluding the NER stage), the total cost:

  • With Healthcare NLP is about $4,500, including the infrastructure costs.
  • $24,250 with Amazon Comprehend Medical
  • $44,000 with the GPT-4 (Turbo) and $22,000 with the GPT-4o.

Therefore, Healthcare NLP is almost 5 times cheaper than its closest alternative, not to mention the accuracy differences (Top 3: Healthcare NLP 82.7% vs Amazon 55.8% vs GPT-4 8.9%).


Accuracy & Cost Table


Try Healthcare NLP

See in action

Using Contextual Assertion for Clinical Text Analysis: A Comprehensive Guide

This blog post explores using Healthcare NLP, a powerful NLP library, for clinical text analysis. It focuses on Contextual Assertion, which significantly...