Skip to main content
was successfully added to your cart.

NLP Case Studies

Spark NLP

Many critical facts required by healthcare AI applications like patient risk prediction, cohort selection and clinical decision support are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like biomedical named entity recognition, assertion status detection, entity resolution, de-identification and others. This case study presents the first industrial-grade implementation of these new results and its application at scale.

Roche is the world’s #1 company for in-vitro diagnostics and its medicines are used to treat over 130 million people each year. It’s building a clinical decision support product portfolio, starting with oncology. Roche is using Spark NLP for Healthcare to extract clinical facts from pathology and radiology reports. The case study covers the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.

Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology – and simplify training, optimization, and inference of such domain-specific models at scale.

Principal Data Scientist for Diagnostic Information Systems at Roche
AI Platform Architecture

Advances and breakthroughs in medicine and public health are built on research and prior learnings. Understandings are contained in a wide range of content, such as the following:

  • Patient records
  • Imaging, genomic, and lab reports
  • Medical billing records
  • Research reports
  • White papers and articles
  • Clinical trial results
  • Medical and healthcare regulatory filings

Petabytes of new information are added every year, which is searched, culled, and perused by researchers, analysts, and data scientists across the entire healthcare sector. They rely on automated systems that leverage artificial intelligence (AI) and Natural Language Processing (NLP) libraries to search for and analyze selected content to locate data they need.

Intel optimizations and 2nd Gen Intel Xeon® Scalable processors deliver up to 116 percent faster performance for the healthcare-specific Natural Language Processing library.

AI & NLP Experts At Your Service

In recent years traditional factors of financial markets, for instance growth vs. value, market capitalization, credit rating, stock price volatility, have become less predictive, requiring investors to explore new data sources such as news, images, social networks content etc. The goal of the current project is to build an instrument, which classifies the companies into different market segments in which they operate. The classification is based on the “Thomson Reuters Business Classification” taxonomy.

Using artificial intelligence methods and machine learning algorithms, we predicted markets labels from textual data by applying text mining techniques to news stories.
In this paper, we describe a Natural Language Processing (NLP) approach using Spark NLP and semantic techniques to assist the domain experts in classifying the documents with different market labels.

The goal of the current project is to build an instrument, which classifies the companies into different market segments in which they operate.

Spark NLP

DocuSign has been on a mission to accelerate business and simplify life for companies and people around the world. The company pioneered the development of e-signature technology, and today DocuSign helps organizations connect and automate how they prepare, sign, act on, and manage agreements.

This case study shows how the company leveraged John Snow Labs’ Spark NLP to automate the extraction of structured information from document images, including:

  • Contracts
  • Tax forms
  • Passport applications
  • Invoices

The DocuSign and JohnSnowLabs team solved the challenges like:

  • High and growing variation in layout
  • Unbounded field type complexity
  • Unstructured information to optimize hospital patient flow models

I am very appreciative of all the hard work put into this project. The staff went above and beyond my expectations and were accommodating. The staff also was very attentive and thought “outside the box”. Overall, I am very satisfied and will gladly work with the staff in the near future again.

Johnson & Johnson
Life Sciences Data Catalog

Many businesses still depend on documents stored as images—from receipts, manifests, invoices, medical reports, and ID cards snapped with mobile phone cameras to contracts, waivers, leases, forms, and audit records digitized with scanners. Extracting high-quality data from these images comes with three challenges. First is OCR, as in dealing with crumpled receipts photographed from an angle in a dimly lit room. Second is NLP, extracting normalized values and entities from the natural language text. The third is building predictors or recommendations that suggest the best next action—and in particular can deal with missing, wrong, or conflicting information generated by the previous steps.

This case study illustrates an AI system that reads millions of pages of patient information, gathered from hundreds of sources, resulting in a great variety of image formats, templates, and quality. It explores the solution architecture and key lessons learned in going from raw images to a deployed predictive workflow based on facts extracted from the scanned documents.

The good news is that state-of-the-art deep learning techniques can now approach human accuracy in these three tasks—and do so at scale.

Chief Clinical Officer at SelectData
John Snow Labs
ESG Document Classification

There is an immense amount of unstructured data generated every day that can affect companies and their position in the market. As this information continuously grows, it’s a critical task for decision makers to process, quantify and analyze this data to identify opportunity and risk. One of the important indicators in this kind of analysis is ESG (environmental, social and governance) rating, which identifies issues for a company in these critical areas.

This White Paper does this automatically for documents continuously ingested from over world news. The models have been deployed in production as part of a big data analytics platform of a leading data provider to the financial services industry.

John Snow Labs provided excellent results using advanced machine learning and NLP techniques. This was combined with a strong delivery process – annotations, data science, to a large-scale production deployment.

Spark NLP Case studies

Answering questions accurately based on information from financial documents, which can be a hundred or more pages long, is a challenge even for human domain experts. While traditional rule-based or expression-matching techniques work for simple fields in templated documents, it is harder to infer facts based on implied statements, on the absence of certain statements, or on the combination of other facts. Answering such questions at a very high level of accuracy requires state-of-the-art deep learning techniques applied to NLP.

Spark NLP was used to augment the UiPath smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. This case study covers the technical challenges, the architecture of the full solution, and lessons learned that you can directly apply to your next data extraction project.

UiPath is excited to support this technology partnership and support a seamless integration of John Snow Labs’ state-of-the-art NLP technology inside UiPath Activities. The joint capability is already providing value to business customers and is broadly applicable.

Senior Manager for Partnerships and Alliances at UiPath
Life Science Datasets

Recruiting patients for clinical trials is a major challenge in drug development. Finding patients requires an in-depth understanding of their medical histories and current health statuses while the majority of patient data is unstructured and spread across physician notes, pathology, imaging, genomic, and other reports. For this reason, clinical trial recruitment is a slow and manual process.

This case study describes how Deep 6 uses the Spark natural language processing (NLP) platform to apply state-of-the-art deep learning to accurately extract the relevant clinical facts from unstructured text. These facts are then used in subsequent data science pipelines in constructing patients’ medical histories.