How We Use Natural Language Processing at John Snow Labs

06.03.2018

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Two years ago I joined John Snow Labs and had the opportunity to work with a number of data-driven companies. These companies wanted to analyze massive amounts of natural language data in real time and were fearless when it comes to experimenting with new technologies. It was then that I began to hear whispers of analyzing text via Natural Language Processing (NLP).

At John Snow Labs we make our living developing commercial software. So it made sense to develop an NLP library that was production-grade. We now have a dedicated research team that oversees the development of our open-source offering for “Spark-NLP.” It comes fully integrated with our commercial offering of the Data Lab Modelserver so that any spark pipeline can be deployed and queried via a simple API. These two offerings have allowed our client Kaiser Permanente (one of the largest healthcare consortiums in the United States with a revenue of $64 billion) to develop and deploy their machine learning models in production.

Here are my three favorite aspects of how we use NLP at John Snow Labs:

Research and Development

The Spark-NLP library allows you to build useful, insightful annotations of your data quickly and efficiently. It is built on top of Apache Spark and its Spark ML library with these ideas in mind:

It should have excellent runtime performance
It should be able to reuse existing Spark libraries and resources
It must be able to serve mission-critical, enterprise-grade projects

We compared our library to the popular Explosion AI spaCy library and found that Spark-NLP is orders of magnitude better than spaCy in terms of performance, accuracy, and scalability (see benchmarks here). Both libraries are open-source with commercially permissive licenses (Apache 2.0 and MIT, respectively). Each also has public documentation, so consider reading the spaCy 101 and Spark-NLP Quick Start guides.

Integration with Data Lab Modelserver

Those of you working on data science projects know that after training, validation and model creation are the hardest aspects of deploying models into production. The John Snow Labs Data Lab platform resolves this issue through our commercial offering of a Modelserver that supports both pmml and sparkml models. With pmml we can deploy numerous models like TreeModel, NeuralNetwork, and RegressionModel. With Spark support the Modelserver is also able to deploy any Spark pipeline including those developed using the Spark-NLP library. The interaction with Modelserver is simple and can be accomplished through a requests library in any language. In addition, application developers can ensure that the models expose business logic. Swagger Docs are available and with one click you can now deploy and query the models.

Using Spark-NLP in Production

John Snow Labs works with Kaiser Permanente on several joint projects including the development of clinical annotation for Spark-NLP and their sentiment analysis model. Our Spark-NLP integrated Modelserver allowed Kaiser to deploy Spark models using the Spark-NLP library. They were also able to use the same Modelserver as a framework for continuous integration and deployment of their machine learning models. Kaiser can now employ the full lifecycle of model development and test updates to the model using an automation server like Jenkins.

To find out more about how our software can help in your NLP problem space, schedule a demo.

As we explore natural language processing applications, consider how innovations like Generative AI in Healthcare and Healthcare Chatbot technologies are transforming the way we analyze and interact with patient data.

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.

John Snow Labs Open Sources the Natural Language Processing Library for Apache Spark

Ida Lucente

[vc_row type="in_container" full_screen_row_position="middle" column_margin="default" column_direction="default" column_direction_tablet="default" column_direction_phone="default" scene_position="center" text_color="dark" text_align="left" row_border_radius="none" row_border_radius_applies="bg" overflow="visible" overlay_strength="0.3" gradient_direction="left_to_right" shape_divider_position="bottom" bg_image_animation="none"][vc_column column_padding="no-extra-padding" column_padding_tablet="inherit" column_padding_phone="inherit" column_padding_position="all" column_element_direction_desktop="default"...