Dive into the Free & Virtual NLP Summit 2023 on October 3-5. Immerse yourself with the world's leading applied NLP community, featuring over 50 technical sessions. Register HERE!
was successfully added to your cart.

Using Spark NLP to De-Identify Doctor Notes in the German Language

The ability to extract clinical information at large scale and in real time from unstructured clinical notes is becoming a mission critical capability for IQVIA. Key data elements like tumor stage & size, Social Determinants of Health, and ejection fractions are not available in typical structured EMR records. Additionally, the Cures Act Final Rule brings the unstructured notes into play from Oct 22 in the US, with the EU likely to follow suite. The market expectation will rapidly be that CRO’s and data vendors are able to leverage unstructured notes for clinical trial recruitment, study registries, precision medicine, and manufacturing longitudinal datasets. The first step in unlocking the value of unstructured healthcare data is efficient handling of Personally Identifiable Information. Accurately anonymizing medical data is challenging when data is in multiple languages, beyond English, and for short doctor notes that pose a different challenge from proper documents with paragraphs and sentences. In this talk we show the data flow for probably the largest multi-country EMR data platform in the world, focusing on the de-identification module that allows to safely ingest doctor notes and open them up for analytics. We go into details of the Spark NLP de-identification model and pipeline built to handle German texts, share results, and summarize lessons learned, including combining rules and models for types of entities where a rules-based outperforms trained models.

The first step in unlocking the value of unstructured healthcare data is efficient handling of Personally Identifiable Information. Accurately anonymizing medical data is challenging when data is in multiple languages, beyond English, and for short doctor notes that pose a different challenge from proper documents with paragraphs and sentences.

In this talk we show the data flow for probably the largest multi-country EMR data platform in the world, focusing on the de-identification module that allows to safely ingest doctor notes and open them up for analytics. We go into details of the Spark NLP de-identification model and pipeline built to handle German texts, share results, and summarize lessons learned, including combining rules and models for types of entities where a rules-based outperforms trained models.

Using Spark NLP in R: a Drug Standardization Case Study

In this talk Katie Goznikar will show you how IDEXX Laboratories has leveraged John Snow Labs pretrained models and the R programming...
preloader