John Snow Labs Releases Spark NLP 4.0, Delivering 8x Speedups, Native M1 Support, and 1,000+ New Models to the Most Used NLP Library in the Enterprise

28.06.2022

Maziyar Panahi

Principal AI / ML Engineer and a Senior Team Lead

Modern Extractive Question Answering Annotators, Notable Performance Improvements, and State-of-the-Art Models Define Spark NLP 4.0

John Snow Labs, the Healthcare AI and NLP company and developer of the NLP library, announced the release of Spark NLP 4.0. With new question answering annotators, major performance improvements, optimizations on new hardware platforms, and more than 1,000 state-of-the-art pre-trained transformer models available in multiple languages, Spark NLP 4.0 is the company’s most significant release this year. This is an example of John Snow Labs’ ongoing commitment to delivering the latest, most accurate NLP software to the global AI community.

The Introduction of question answering annotators in Spark NLP enables the software to answer arbitrary natural language questions based on a given document. The models provide both an answer and an explanation of where in the document the answer came from. Hundreds of pre-trained models are available out-of-the-box, based on BERT, ALBERT, DeBERTa, RoBERTa, DistilBERT, Longformer, and XLM-RoBERTa, enabling support for multiple languages, document types, and performance goals. Models are trained and fine-tuned, so users can start using these applications immediately. This is another step in making NLP more easily usable for building production-grade systems.

Spark NLP 4.0 is optimized for the latest hardware technologies, with official support for the Apple silicon M1 chip, as well as support for Intel’s oneAPI Deep Neural Network Library (oneDNN). Enabling onDNN can improve transformer-based models running on CPU chips up to 97%. Additionally, with enhancements for Nvidia GPU processors, users are experiencing performance speedups of up to 700%. Other notable capabilities include support for the latest runtimes of Databricks, AWS EMR, and Kubernetes.

Improvements to the accuracy of key tasks, delivering new state-of-the-art accuracy for two popular tasks are other features of the release. One is named entity recognition (NER), for which Spark NLP 4.0 now provides the most accurate model on the popular CoNLL-2003 benchmark among open source NLP libraries. The second is coreference resolution, using BERT-based span classification to outperform traditional approaches and libraries.

Celebrating just five years in production, 33% of the world’s enterprises use Spark NLP. That number jumps to 59% among AI practitioners in the healthcare and life science industries, according to Gradient Flow. John Snow Labs’ customers include half of the world’s top 10 pharmaceutical companies and the three largest US healthcare companies, among others. Organizations including Roche, Mount Sinai, GE Healthcare, Merck, McKesson, and Kaiser Permanente will benefit from the release of Spark NLP 4.0.

“As the most widely used NLP library in the enterprise, we have a responsibility to deliver accurate, production-grade, state-of-the-art NLP software,” said David Talby, CTO, John Snow Labs. “With the pace of technology and business evolution, last year’s best-of-breed AI tools are already falling behind. Our promise to our customers and the open source community is that we will always keep them state-of-the-art—and this new release delivers on that promise.”

To hear more about the latest Spark NLP product enhancements, industry trends, and innovative use cases, join us for the third annual NLP Summit, taking place October 4-6 online. The free, three day event includes immersive, industry-focused content including over 50 technical sessions, with focus days on open source, healthcare, and finance. Connect with speakers, network with peers, and access all content on-demand after the event.

Follow @JohnSnowLabs on Twitter for the latest news and updates. To learn more about Spark NLP or to start your free trial, visit: https://www.johnsnowlabs.com/spark-nlp/.

Try Spark NLP

See in action

Maziyar Panahi

Principal AI / ML Engineer and a Senior Team Lead

Our additional expert:

Maziyar Panahi is a Principal AI / ML engineer and a senior Team Lead with over a decade-long experience in public research. He leads a team behind Spark NLP at John Snow Labs, one of the most widely used NLP libraries in the enterprise. He develops scalable NLP components using the latest techniques in deep learning and machine learning that includes classic ML, Language Models, Speech Recognition, and Computer Vision. He is an expert in designing, deploying, and maintaining ML and DL models in the JVM ecosystem and distributed computing engine (Apache Spark) at the production level. He has extensive experience in computer networks and DevOps. He has been designing and implementing scalable solutions in Cloud platforms such as AWS, Azure, and OpenStack for the last 15 years. In the past, he also worked as a network engineer in high-level places after he completed his Microsoft and Cisco training (MCSE, MCSA, and CCNA). He is a lecturer at The National School of Geographical Sciences teaching Big Data Platforms and Data Analytics. He is currently employed by The French National Centre for Scientific Research (CNRS) as IT Project Manager and working at the Institute of Complex Systems of Paris (ISCPIF).

John Snow Labs is Announcing the New Partner Integrations in Databricks Partner Connect

Ida Lucente

We are excited to announce new integrations in Databricks Partner Connect. Databricks expanding partnerships enable users to integrate the freshest data into...

John Snow Labs Releases Spark NLP 4.0, Delivering 8x Speedups, Native M1 Support, and 1,000+ New Models to the Most Used NLP Library in the Enterprise

Modern Extractive Question Answering Annotators, Notable Performance Improvements, and State-of-the-Art Models Define Spark NLP 4.0

John Snow Labs is Announcing the New Partner Integrations in Databricks Partner Connect

Recommended For You