Two years ago I joined John Snow Labs and had the opportunity to work with a number of data-driven companies. These companies wanted to analyze massive amounts of natural language data in real time and were fearless when it comes to experimenting with new technologies. It was then that I began to hear whispers of analyzing text via Natural Language Processing (NLP).

At John Snow Labs we make our living developing commercial software. So it made sense to develop an NLP library that was production-grade. We now have a dedicated research team that oversees the development of our open-source offering for “Spark-NLP.” It comes fully integrated with our commercial offering of the Data Lab Modelserver so that any spark pipeline can be deployed and queried via a simple API. These two offerings have allowed our client Kaiser Permanente (one of the largest healthcare consortiums in the United States with a revenue of $64 billion) to develop and deploy their machine learning models in production.

 

Here are my three favorite aspects of how we use NLP at John Snow Labs:

 

  1. Research and Development

The Spark-NLP library allows you to build useful, insightful annotations of your data quickly and efficiently. It is built on top of Apache Spark and its Spark ML library with these ideas in mind:

  • It should have excellent runtime performance
  • It should be able to reuse existing Spark libraries and resources
  • It must be able to serve mission-critical, enterprise-grade projects

We compared our library to the popular Explosion AI spaCy library and found that Spark-NLP is orders of magnitude better than spaCy in terms of performance, accuracy, and scalability (see benchmarks here). Both libraries are open-source with commercially permissive licenses (Apache 2.0 and MIT, respectively). Each also has public documentation, so consider reading the spaCy 101 and Spark-NLP Quick Start guides.

 

  1. Integration with Data Lab Modelserver

Those of you working on data science projects know that after training, validation and model creation are the hardest aspects of deploying models into production. The John Snow Labs Data Lab platform resolves this issue through our commercial offering of a Modelserver that supports both pmml and sparkml models. With pmml we can deploy numerous models like TreeModel, NeuralNetwork, and RegressionModel. With Spark support the Modelserver is also able to deploy any Spark pipeline including those developed using the Spark-NLP library. The interaction with Modelserver is simple and can be accomplished through a requests library in any language. In addition, application developers can ensure that the models expose business logic. Swagger Docs are available and with one click you can now deploy and query the models.

 

  1. Using Spark-NLP in Production

John Snow Labs works with Kaiser Permanente on several joint projects including the development of clinical annotation for Spark-NLP and their sentiment analysis model. Our Spark-NLP integrated Modelserver allowed Kaiser to deploy Spark models using the Spark-NLP library. They were also able to use the same Modelserver as a framework for continuous integration and deployment of their machine learning models. Kaiser can now employ the full lifecycle of model development and test updates to the model using an automation server like Jenkins.

 

To find out more about how our software can help in your NLP problem space, schedule a demo.