Meet our team at BioTechX Europe in Basel on the 9-10 October 2024, booth 724. Schedule a meeting with our team HERE.
was successfully added to your cart.

    How We Use Natural Language Processing at John Snow Labs

    Avatar photo
    Ph.D. in Computer Science – Head of Product

    Two years ago I joined John Snow Labs and had the opportunity to work with a number of data-driven companies. These companies wanted to analyze massive amounts of natural language data in real time and were fearless when it comes to experimenting with new technologies. It was then that I began to hear whispers of analyzing text via Natural Language Processing (NLP).

    At John Snow Labs we make our living developing commercial software. So it made sense to develop an NLP library that was production-grade. We now have a dedicated research team that oversees the development of our open-source offering for “Spark-NLP.” It comes fully integrated with our commercial offering of the Data Lab Modelserver so that any spark pipeline can be deployed and queried via a simple API. These two offerings have allowed our client Kaiser Permanente (one of the largest healthcare consortiums in the United States with a revenue of $64 billion) to develop and deploy their machine learning models in production.

     

    Here are my three favorite aspects of how we use NLP at John Snow Labs:

     

    Research and Development

    The Spark-NLP library allows you to build useful, insightful annotations of your data quickly and efficiently. It is built on top of Apache Spark and its Spark ML library with these ideas in mind:

    • It should have excellent runtime performance
    • It should be able to reuse existing Spark libraries and resources
    • It must be able to serve mission-critical, enterprise-grade projects

    We compared our library to the popular Explosion AI spaCy library and found that Spark-NLP is orders of magnitude better than spaCy in terms of performance, accuracy, and scalability (see benchmarks here). Both libraries are open-source with commercially permissive licenses (Apache 2.0 and MIT, respectively). Each also has public documentation, so consider reading the spaCy 101 and Spark-NLP Quick Start guides.

     

    Integration with Data Lab Modelserver

    Those of you working on data science projects know that after training, validation and model creation are the hardest aspects of deploying models into production. The John Snow Labs Data Lab platform resolves this issue through our commercial offering of a Modelserver that supports both pmml and sparkml models. With pmml we can deploy numerous models like TreeModel, NeuralNetwork, and RegressionModel. With Spark support the Modelserver is also able to deploy any Spark pipeline including those developed using the Spark-NLP library. The interaction with Modelserver is simple and can be accomplished through a requests library in any language. In addition, application developers can ensure that the models expose business logic. Swagger Docs are available and with one click you can now deploy and query the models.

     

    Using Spark-NLP in Production

    John Snow Labs works with Kaiser Permanente on several joint projects including the development of clinical annotation for Spark-NLP and their sentiment analysis model. Our Spark-NLP integrated Modelserver allowed Kaiser to deploy Spark models using the Spark-NLP library. They were also able to use the same Modelserver as a framework for continuous integration and deployment of their machine learning models. Kaiser can now employ the full lifecycle of model development and test updates to the model using an automation server like Jenkins.

     

    To find out more about how our software can help in your NLP problem space, schedule a demo.

    How useful was this post?

    Avatar photo
    Ph.D. in Computer Science – Head of Product
    Our additional expert:
    Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.

    John Snow Labs Open Sources the Natural Language Processing Library for Apache Spark

    John Snow Labs is pleased to announce the availability of its Natural Language Processing software library for Apache Spark. The provides simple, high...
    preloader