Meet us at the Data + AI Summit, June 28th from 10 AM to 3 PM ET at the Healthcare and Life Sciences Booth in the Industry Lounge. Book now
was successfully added to your cart.

John Snow Labs is Named NLP Company of the Month & Announces Major New Release of Spark NLP

Technology Headlines highlights John Snow Labs as the emerging industry leader for state-of-the-art natural language processing in healthcare

John Snow Labs is most widely recognized as the developer of the Spark NLP library, providing state-of-the-art NLP capabilities to the open-source community. Based on a February 2019 O’Reilly survey of 1,300 professionals, Spark NLP is the world’s most widely used NLP library in the enterprise. Within less than two years of its first release, it is twice as popular as the next NLP library on the list.

John Snow Labs is emerging as the clear industry leader for state-of-the-art NLP in healthcare. We cannot recommend a better way to apply the most current, accurate, and scalable technology to your natural language understanding challenges today,” says Carlos Chavez, Editor-in-Chief, The Technology Headlines.

For a number of common NLP use cases, Spark NLP delivers state-of-the-art accuracy: production-grade versions of the best performing academic peer-reviewed results to date. Spark NLP was the first to provide production-grade, scalable, and trainable versions of the most recent deep-learning based research for NLP. It was the first to productize and open-source BERT embeddings for named entity recognition. Spark NLP ships with over 30 pre-trained pipelines and models – enabling users to get things done within minutes and a few lines of code.

John Snow Labs’ Spark NLP is the only natively distributed open-source text-processing library for Python & Scala. Zero code changes are needed to scale a pipeline to any spark cluster. It provides production-grade versions of the latest research in NLP – raising the bar on accuracy, speed, and scalability.

Spark NLP is the only NLP library that can leverage both GPU’s and current Intel Xeon processors to their maximum potential. It’s faster than alternatives on a single machine as well – 80 times faster than spaCy for training a simple NLP pipeline on one mid-range machine in a public benchmark.

Spark NLP is adopted across the world. A recent analysis showed that interest is 44% from the Americas, 24% from the Asia-Pacific region, and 22% from Europe & the Middle East.

Continuous Innovation: Better Functionality and Accuracy

John Snow Labs has a full development team dedicated to Spark NLP and another full team that’s building Spark NLP for Healthcare. The company made 26 new releases in 2018 and 16 new releases in the first half of 2019. The frequent releases and ongoing stream of new & improved features helped build momentum and a lot of trust in the library within the data science community.

John Snow Labs announces its latest major release with Spark NLP 2.2. This release improves accuracy and enables new use cases as prioritized by our customers and the community. Major new features include OCR based coordinate highlighting, BERT embeddings refactoring and tuning, new tools for accuracy evaluation in Python, and more:

* Named Entity Recognition with deep learning now has `includeConfidence` param that returns confidence scores on prediction metadata.

* Named Entity Recognition with deep learning approach now has `enableOutputLog` outputs training metric logs to file, making it easier to track and optimize long model training runs.

* OCRHelper now returns a coordinate positions matrix for text converted from PDF documents.

* A new annotator called PositionFinder consumes OCRHelper positions to return rectangle coordinates for CHUNK annotator types. This enables visualizing where each chunk originally came from in a PDF.

* The evaluation module now also ported to Python. This provides accuracy metrics for each epoch in a machine learning or deep learning training run for new NLP models.

* WordEmbeddings now include coverage metadata information. Two new static functions `withCoverageColumn` and `overallCoverage` offer metric analysis.

* A new parameter in BERT `poolingLayer` allows for polling layer selection. This has shown to improve accuracy for some domain-specific NLP use cases.

Spark NLP 2.2 is immediately available. The project’s NLP Quick Start guides installation for a diverse set of development environments.