Intel

John Snow Labs’ Spark NLP for Healthcare Library Speeds Up Automated Language Processing with Intel® AI Technologies

Read the full case study

INDUSTRY: Healthcare/Life Sciences

Introduction: John Snow Labs’ Spark NLP for Healthcare implements the latest breakthroughs in deep learning, transfer learning, and transformers, providing researchers and practitioners with production-grade, scalable, and trainable versions of novel NLP research for the first time. Designed to run on the latest processing technologies and with Intel AI optimizations and software, Spark NLP for Healthcare can deliver up to 116 percent faster learning on Intel Xeon Scalable processors with Intel MKL and Intel Optimizations for TensorFlow.

Challenge: “Advances and breakthroughs in medicine and public health are built on research and prior learnings. Understandings are contained in a wide range of content, such as the following:

  • Patient records
  • Imaging, genomic, and lab reports
  • Medical billing records
  • Research reports
  • White papers and articles
  • Clinical trial results
  • Medical and healthcare regulatory filings

Petabytes of new information are added every year, which is searched, culled, and perused by researchers, analysts, and data scientists across the entire healthcare sector. They rely on automated systems that leverage artificial intelligence (AI) and Natural Language Processing (NLP) libraries to search for and analyze selected content to locate data they need. In addition to massive amounts of data, medicine and public health are filled with thousands of unique terms and identifiers. Many critical facts required by healthcare AI applications such as patient risk prediction, cohort selection, automated clinical coding, and clinical decision support, are locked in unstructured free-text data.”

Solution: “John Snow Labs offers a commercial version of their library for healthcare and life science data scientists, called Spark NLP for Healthcare. This version of the popular library provides a production-grade, scalable, and trainable implementation of novel healthcare-specific NLP algorithms and models. It includes pre-trained models for the most common medical NLP tasks.
Spark NLP for Healthcare extends the open-source library, Raising the bar on achievable accuracy for tasks like clinical named recognition (NER), assertion status detection, entity resolution, de-identification, and Optical Character Recognition (OCR). The industry-specific library enables easy and automated access to information hidden across a broad range of documents—both physical and electronic.”

Benefit (subtitle): “Spark NLP for Healthcare Enhances Clinical and Life Sciences Research

Spark NLP Runs Faster for Lower Cost on Intel AI Technologies

Spark NLP is the only distributed and natively scalable NLP library today. It implements the latest breakthroughs in deep learning, transfer learning, and transformers, providing practitioners and enterprises with production-grade, scalable, and trainable versions of novel research for the first time.

Spark NLP was adopted by 16% of enterprises within 18 months of its first release.

It has remained the most widely used NLP library in the enterprise.

It continues to improve rapidly, with 30 new releases in 2019.”

Benchmarks were done on a 2 GB training dataset with ~108K sentences for a named entity recognition task in French. Testing demonstrated 116% improvement through use of Intel MKL libraries and Intel Xeon Scalable processors.
When compared against previous generation Intel Xeon processors, Intel technologies and processors deliver over 2X speedup of Spark NLP.

“Intel optimizations and 2nd Gen Intel Xeon® Scalable processors deliver up to 116% faster performance for the healthcare-specific Natural Language Processing library.”

Intel