John Snow Labs is invited to speak at O’Reilly AI on using NLP to interpret patient journeys from millions of free-text medical records

11.09.2019

Ida Lucente

Marketing Communications Lead at John Snow Labs

Award-winning AI and NLP company John Snow Labs is invited for the 3rd year in a row to present at O’Reilly AI. This year the company will jointly present one of its most impactful case studies on using NLP to interpret medical records.

Interpreting millions of patient stories with deep learned NLP OCR will be delivered by Alberto Andreotti, a data scientist at John Snow Labs, and Stacy Ashworth, Chief Clinical Officer at SelectData. It describes how John Snow Labs’ state-of-the-art Spark NLP for Healthcare more specific Clinical NLP extracts high-quality facts from medical records, with great accuracy and at scale.

Many businesses still depend on documents stored as images—from receipts, manifests, invoices, medical reports, and ID cards snapped with mobile phone cameras to contracts, waivers, leases, forms, and audit records digitized with scanners. Extracting high-quality data from these images comes with three challenges. First is OCR, as in dealing with crumpled receipts photographed from an angle in a dimly lit room. Second is NLP, extracting normalized values and entities from the natural language text. The third is building predictors or recommendations that suggest the best next action—and in particular can deal with missing, wrong, or conflicting information generated by the previous steps.

The good news is that state-of-the-art deep learning techniques, now available as open source software, can approach human accuracy in these three tasks—and do so at scale. Stacy Ashworth and Alberto Andreotti explore a case study of an AI system that reads millions of pages of patient information, gathered from hundreds of sources, resulting in a great variety of image formats, templates, and quality. They explore the solution architecture and key lessons learned in going from raw images to a deployed predictive workflow based on facts extracted from the scanned documents.

The talk will introduce Spark NLP for Healthcare – a natively distributable, deep learning-based library – and its OCR capabilities. The OCR library employs adaptive scaling, rotation, and erosion to achieve a significant accuracy boost compared to Tesseract. Spark NLP applies techniques such as BERT embeddings, trainable pipelines, and DL-based sentence segmentation and spell checking that materially improve accuracy for OCR-sourced text mining. Since both libraries are native extensions of Apache Spark, a unified pipeline can be written in Python, Java, or Scala for all three stages (including ML based on the results of OCR and NLP), enabling a new level of scale, speed, and reproducibility for the entire pipeline from image to next-best action.

Ida Lucente

Marketing Communications Lead at John Snow Labs

Our additional expert:

Marketing Communications Lead at John Snow Labs. Experienced Branding, Marketing Strategy and Communications with a demonstrated history of working in the marketing and advertising industry. For media inquiries: Ida Lucente John Snow Labs ida@johnsnowlabs.com

Next Article

John Snow Labs is named '2019 AI Platform of the Year'

Ida Lucente

We're very honored to share with you that CIO Applications has chosen John Snow Labs as 2019 Artificial Intelligence Platform of the...

John Snow Labs is invited to speak at O’Reilly AI on using NLP to interpret patient journeys from millions of free-text medical records

John Snow Labs is named '2019 AI Platform of the Year'

Recommended For You