Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

How Natural Language Processing Is Helping Doctors Make Better Diagnoses

NLP is the technological innovator across every industry as it is shaping the future of humanity in various ways. In the Healthcare industry, NLP techniques can be applied in many ways:

  • Creating personalized patient experience
  • Streamlining drug discovery and development
  • Accurately diagnoses diseases at initial stages
  • Deidentifying clinical notes in various languages
  • Detecting relations between body parts and clinical entities
  • Detecting drug chemicals, and identifying relations between drugs and adversary events
  • Detecting labs results and clinical events
  • Detecting clinical entities in radiology reports
  • Identifying demographic information from oncology texts, etc.

With the support of NLP, doctors can be better equipped to make better diagnoses and analyze patients’ conditions.

Significance of NLP in Healthcare

Let’s discuss some uses of NLP in Healthcare.

NLP in Electronic Health Records (EHR)

Electronic Health Records (EHRs) are a digital version of a patient’s medical history. They contain key administrative data like a patient’s medical history, demographics, medications, treatment plans, immunization dates, allergies, laboratory test results, radiology images, etc. They are real-time records maintained by a patient’s provider over time.

Physicians spend a huge amount of time populating EHRs with huge volumes of unstructured patient data. They input the HOW and the WHY of their patients’ sufferings into chart notes and those notes go into the Electronic Health Records as free text data. Despite the laborious nature of the task, the notes are not structured in a way that can be effectively analyzed by a computer.

Structured data like CCDAs/FHIR APIs can help determine the disease but they give us a limited view of the actual patient record.

Without Natural Language Processing, the unstructured data is of no use to modern computer-based algorithms. Healthcare NLP saves the time and effort of physicians, and makes the information of use by:

  • Using specialized engines that scrub large sets of unstructured data and discover improperly coded or previously missed patient conditions.
  • Converting the data into a structured format so the health systems can classify patients and summarize their condition on arrival.
  • Allowing physicians to extract critical insights rather than wasting time in reviewing complex EHRs.

NLP in EHR also saves costs and extracts data for an in-depth big data analysis. The major applications of NLP in Electronic Health Records are:

  • Extracting information from clinical notes – NLP extracts critical information such as diagnoses, timelines, recommendations, and various hypothetical symptoms from clinical texts.
  • Boosting phenotyping capabilities – Phenotypes are physical/physiological characteristics in an organism that can be related to behavior, biological processes, or physical appearance. Most analysts use structured data for phenotyping as it is easy to extract. Linguamatics, an NLP text mining platform, has partnered with Stead Family Children’s Hospital, Iowa, to use NLP for improving phenotype extraction for precision medicine.
  • Finding patient cohorts for clinical trials – Premier Applied Sciences® (PAS) has teamed with Clinithink, a healthcare technology startup, to bring NLP to investigators, trial sponsors, and other research organizations that can benefit from it.
  • Visualizing data for chart review – Harvard Medical School’s Translational Data Science Center for a Learning Health System has collaborated with VERITY Bioinformatics researchers to develop the Chart Review Tool Powered by NLP (CHANL). This tool makes chart review of narrative text notes from EHRs easier.

NLP in Clinical Trial Management

Clinical Trials are research studies that aim to evaluate how new medical approaches work on humans. They are important to examine the safety and efficacy of new treatments. Conducting clinical trials is expensive and time-consuming as it consists of various phases. Though a portion of the clinical trial report contains well structured information searchable using keywords, most of the information is still buried in an unstructured format.

In a typical clinical trial scenario, the staff has to manually find patients who meet complex I/E (Inclusion/Exclusion) criteria. The inclusion criteria identify the characteristics the participants must have if they intend to join the study. Exclusion criteria, on the other hand, are the characteristics that disqualify a participant from joining a study.

There’s a heavy burden on staff to identify patients manually as the process is long, complicated, and costly. Here comes the use of the NLP technology that automates and simplifies the patient identification process.

By applying the I/E criteria to Electronic Medical Record (EMR) data, NLP reads and understands the clinical data and rapidly identifies the right types of patients to enroll in a clinical trial. Clinical NLP has a great processing power and it can process up to two million documents per hour. It means, the technology can accurately identify many more eligible patients in much less time.

Natural language processing in Healthcare also allows trial developers to access the suitability of the site based on certain factors like:

  • Investigator’s availability
  • Historical performance metrics
  • Experience in therapy area

Based on these factors, the sites that outperform against expected site metrics can be selected for the clinical trials.

Bristol-Myers Squibb (BMS) conducted a clinical trial to learn more about patient stratification for heart failure risk. The researchers collected imaging data and EHR from roughly around nine hundred patients and used NLP to collect information on 40 elements related to clinical outcomes, phenotypes, patient demographics, etc. They used this information to classify patients into four different groups.

NLP in Drug Discovery and Development

NLP in healthcare is playing a crucial role in accelerating small molecule drug discovery. Instead of manually looking for information in large databases, we can use medical NLP to identify and extract valuable data. Scientists can use NLP tools to find previously unknown chemical reactions and can recommence experiments based on results. The saved results from the past clinical experiments act as training data for the Machine Learning models and extract meaningful data within seconds instead of hours.

Transformer architectures are of great use in NLP as they help understand the chemical structure of various text-based representations and perform tasks including molecular optimization and reaction prediction.

Transformer-based NLP models also predict the structure and function of biomolecules like proteins. They understand the representations of proteins sequences and provide powerful embeddings for use in AI tasks such as:

  • Design of a protein structure
  • Prediction of the final folded of a protein
  • Understanding of the protein-small molecule interactions

NLP also aids various processes in drug development lifecycle such as:

  • Gene-disease mapping – While discovering a new medicine, the first step is to identify the biological origin of the disease that requires a comprehensive understanding of the genes involved in the disease pathway. NLP-based text mining rapidly accesses and analyzes the relevant information required to identify targets.
  • Adverse drug events detection – NLP is used to detect specific adverse events like nosocomial infections and outperforms traditional adverse event detection methods.

Apart from these uses, NLP is used for Patient care and Monitoring, Precision Medicine, and Biomedical research.

Role of NLP in Medical Diagnosis & Procedures

Unstructured data contains plenty of information that can play a significant role in improving patient monitoring and decision making. Natural Language Processing sorts through unstructured data and helps healthcare providers improve patient care and disease diagnosis.

Clinical assertion modeling helps healthcare providers by:

  • Analyzing clinical notes
  • Determining whether the patient is suffering from a problem
  • Identifying whether the problem is present, absent, or conditional

If a patient suffers from a problem and tells the symptoms to doctors, they note them down. They can then use a combination of Named Entity Recognition (NER) and text classification to analyze the notes and categorize the symptoms as Problem entities. They can further categorize the problems by making assertions (present, conditional, absent). This way NLP enables physicians to optimize patient care and monitoring by analyzing which problems are most pressing and require immediate treatment.

The notable uses of NLP in Medical Diagnoses and Procedures are:

  • Detecting clinical entities in text – We can automatically detect more than 50 clinical entities using the NER deep learning model. The common types of clinical entities are “test”, “problem”, and “treatment”. Clinical Named Entity Recognition (CNER) locates and classifies clinical terminologies into predefined categories, such as disease disorder, severity, diagnostic procedure, medication, medication dosage, sign symptoms, etc.
  • Identifying Assertion Status for diagnosis and symptoms – NLP automatically detects if a diagnosis or a symptom is present, absent, possible, planned, past, none, hypothetical, related to a family member, or related to someone else.

Assertion status helps identify if the information is described to happen in the present, past, future or if it’s just possible. The conditions that are PRESENT need to be taken into consideration to make prescriptions. However, a condition labeled as PAST can be counted as a medical history.

  • Detecting diagnosis and procedures – We can automatically identify diagnoses and procedures in clinical documents using the pretrained Spark NLP clinical models. Getting the right diagnoses is a key aspect of healthcare as it provides details of a patient’s health problem and subsequent healthcare decisions. Delay in disease diagnosis can be harmful as it prevents patient’s appropriate and timely treatment.
  • Detecting temporal relations for clinical events – We can automatically identify three types of relations between clinical events: After, Before and Overlap using the pre-trained clinical Relation Extraction (RE) model. Temporal relations between clinical events play a crucial role in clinical assessment and decision making. The detection of these relations augments the value of EHRs for understanding disease progression and patients’ responses to treatments. For instance, the sentence “The patient underwent the surgery on Monday.” contains an event mention “surgery”, a time expression “Monday”, and a temporal relation of type “overlap” between the event mention and the time expression.
  • Detecting causality between symptoms and treatment – We can automatically identify relations between symptoms and treatment using the pre-trained clinical Relation Extraction (RE) model.

In the above figure, we see various relations between entities. For instance, “respiratory tract infection” is a PROBLEM and “amoxicillin” is a TREATMENT that cures this medical problem. These entities are related to each other via TrAP relation.

Each relation has a particular meaning shown by the table below.

  • Detecting relations between body parts and clinical entities – We can automatically identify relations between body parts and symptoms/diagnosis using pre-trained clinical Relation Extraction (RE) model.

In the above figure, we see that the entities are connected to each other via 0 and 1 relationship. 0 specifies that the entities are not related and 1 specifies that the two entities are related to each other in a certain manner. For instance, No “pathological changes” to the “bony thorax” means that the symptom is not related to the respective organ.

  • Detecting clinical entities such as problems, tests and treatments, and determining how they relate to specific dates – The element of time is of major importance in a diagnostic procedure. When a diagnosis is made in a timely manner, a patient has the opportunity for a positive health outcome as clinical decision making will be tailored to a correct understanding of the patient’s health problem.


In this article, we discussed how NLP helps doctors make better diagnoses. In the healthcare industry, NLP has major uses in Electronic Health Records, Clinical Trial management, drug discovery and development. It improves clinical documentation and patient-provider interactions with EHR. It helps patients understand their symptoms and gain more knowledge about their conditions. Patients can make informed decisions if they are well-aware of their health conditions. Further, we discussed the notable uses of NLP in  medical diagnoses and procedures such as detecting clinical entities in text, identifying Assertion status for diagnosis and symptoms, detecting temporal relations for clinical events, etc.

Healthcare NLP is the most widely used library in the Healthcare industry. It comes with 700+ pre-trained clinical models that are developed and trained to solve the real world problems in the healthcare domain at scale.

Try Clinical NLP

See in action

Automated Summarization of Clinical Notes

In this webinar, Veysel will delve into the challenges of and need for text summarization and the importance of summarization in various...