Roche

Spark NLP: How Roche Automates Knowledge Extraction from Pathology Reports

Read the full case study

INDUSTRY: Healthcare

Introduction: “Roche is the world’s #1 company for in-vitro diagnostics and its medicines are used to treat over 130 million people each year. It’s building a clinical decision support product portfolio, starting with oncology. Roche is using Spark NLP for Healthcare to extract clinical facts from pathology and radiology reports.
This case study covers the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.”

Challenge: Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection. Recent advances in deep learning for NLP have enabled a new level of accuracy and scalability for clinical language understanding making a broad set of applications possible for the first time.

Solution: “Roche used Spark NLP for Healthcare and OCR to tackle the challenges of a healthcare setting: understanding clinical terminology, extracting specialty-specific facts of interest, and using transfer learning to minimize the required amount of task-specific annotation.

It was crucial the integration of MLflow with Spark NLP to track experiments and reproduce results.”

“Benefit (subtitle): “Higher accuracy with Spark NLP, specialized for medical data, with minimized time to train models.

OCR with high accuracy and ability to retain document structure like tables, lists, and backgrounds.”

Spark NLP and OCR helped extract clinical facts from unstructured free-text (such pathology reports and radiology reports), to aid with clinical decision support.

“Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology – and simplify training, optimization, and inference of such domain-specific models at scale.”

Principal Data Scientist for Diagnostic Information Systems at Roche