Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Spark NLP For Healthcare 2.7.3 with Biobert Extraction Models, Higher Accuracy, De-Identification, New Radiology NER Model & More

Spark NLP for Healthcare 2.7.3 is available now! If you’re an existing customer, expect an email with upgrade instructions.

In this release we introduce the following new models, features, and improvements:

  • Massive improvements & feature enhancements in De-Identification module
  • A brand-new RelationExtractionDL Annotator
  • Drug Normalizer
  • Confidence Scores in assertion output
  • Cosine similarity metrics in entity resolvers
  • AuxLabel in the metadata of entity resolvers
  • New Relation Extraction models
  • New Entity Resolver models
  • New Clinical Pretrained NER model
  • Bug fixes & general improvements
  • Matching the version with Spark NLP open-source v2.7.3


Improvements in De-Identification Module


Faker Library

This release introduces the ‘faker’ library in De-Identification Module to automatically generate random data like names, dates, addresses etc. so users do not have to specify dummy data (custom obfuscation files can still be used). It also improves the obfuscation results due to a bigger pool of random values.


How to use


Structured De-Identification Module

This release comes with a new annotator to handle the de-identification of structured data. It allows users to define a mapping of columns and their obfuscation policy. Users can also provide dummy data and map them to columns they want to replace values in.


How to use





Introducing SOTA Relation Extraction model using BioBert

We released a brand-new end-to-end trained BERT model, resulting in massive improvements. Another new annotator (ReChunksFilter) is also developed for this new model to allow syntactic features to work well with BioBert to extract relations.


How to use


Drug Normalizer

This release comes with Drug Normalizer to standardize units of drugs and handle abbreviations in raw text or drug chunks identified by any NER model. This normalization significantly improves the performance of entity resolvers.


How to use


Assertion models to support confidence in the output

Just like NER output, now assertion models also provide confidence scores for each prediction.


New Relation Extraction Models

This release introduces 3 new relation extraction models between body parts and clinical entities and 1 new relation extraction model between date and clinical entities. These models are trained using a binary relation extraction approach for better accuracy.

  • re_bodypart_direction : Relation Extraction between Body Part and Direction entities.


Text: “MRI demonstrated infarction in the upper brain stem, left cerebellum and right basil ganglia”

  • re_bodypart_problem : Relation Extraction between Body Part and Problem entities.


Text: “No neurologic deficits other than some numbness in his left hand.”

  • re_bodypart_proceduretest : Relation Extraction between Body Part and Procedure, Test entities.


Text: “TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound.”

  • re_date_clinical : Relation Extraction between Date and different clinical entities.


Text: “This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.”


How to use


New Resolver Models using JSL SBERT

  • sbiobertresolve_icd10cm_augmented
  • sbiobertresolve_cpt_augmented
  • sbiobertresolve_cpt_procedures_augmented
  • sbiobertresolve_icd10cm_augmented_billable_hcc
  • sbiobertresolve_hcc_augmented

Returning auxiliary columns mapped to resolutions: Chunk entity resolver and sentence entity resolver now returns auxiliary data that is mapped the resolutions during training. This will allow users to get multiple resolutions with a single model without using any other annotator in the pipeline (In order to get billable codes otherwise there need to be other modules in the same pipeline)

We also added a cosine similarity metric in metadata to resolve entities and find the closest matches, resulting in better, more semantically correct results.

  • sbiobertresolve_icd10cm_augmented_billable_hcc


Input Text: “bladder cancer”

  • sbiobertresolve_cpt_augmented


Input Text: “ct abdomen without contrast”


New Pretrained Clinical NER Model

  • NER Radiology

Input Text: “Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma.”

Automated Drug Adverse Event Detection from Unstructured Text

Adverse Drug Events (ADEs) are potentially very dangerous to patients and are amongst the top causes of morbidity and mortality. Monitoring &...