Modern Extractive Question Answering Annotators, Notable Performance Improvements, and State-of-the-Art Models Define Spark NLP 4.0
John Snow Labs, the Healthcare AI and NLP company and developer of the Spark NLP library, announced the release of Spark NLP 4.0. With new question answering annotators, major performance improvements, optimizations on new hardware platforms, and more than 1,000 state-of-the-art pre-trained transformer models available in multiple languages, Spark NLP 4.0 is the company’s most significant release this year. This is an example of John Snow Labs’ ongoing commitment to delivering the latest, most accurate NLP software to the global AI community.
The Introduction of question answering annotators in Spark NLP enables the software to answer arbitrary natural language questions based on a given document. The models provide both an answer and an explanation of where in the document the answer came from. Hundreds of pre-trained models are available out-of-the-box, based on BERT, ALBERT, DeBERTa, RoBERTa, DistilBERT, Longformer, and XLM-RoBERTa, enabling support for multiple languages, document types, and performance goals. Models are trained and fine-tuned, so users can start using these applications immediately. This is another step in making NLP more easily usable for building production-grade systems.
Spark NLP 4.0 is optimized for the latest hardware technologies, with official support for the Apple silicon M1 chip, as well as support for Intel’s oneAPI Deep Neural Network Library (oneDNN). Enabling onDNN can improve transformer-based models running on CPU chips up to 97%. Additionally, with enhancements for Nvidia GPU processors, users are experiencing performance speedups of up to 700%. Other notable capabilities include support for the latest runtimes of Databricks, AWS EMR, and Kubernetes.
Improvements to the accuracy of key tasks, delivering new state-of-the-art accuracy for two popular tasks are other features of the release. One is named entity recognition (NER), for which Spark NLP 4.0 now provides the most accurate model on the popular CoNLL-2003 benchmark among open source NLP libraries. The second is coreference resolution, using BERT-based span classification to outperform traditional approaches and libraries.
Celebrating just five years in production, 33% of the world’s enterprises use Spark NLP. That number jumps to 59% among AI practitioners in the healthcare and life science industries, according to Gradient Flow. John Snow Labs’ customers include half of the world’s top 10 pharmaceutical companies and the three largest US healthcare companies, among others. Organizations including Roche, Mount Sinai, GE Healthcare, Merck, McKesson, and Kaiser Permanente will benefit from the release of Spark NLP 4.0.
“As the most widely used NLP library in the enterprise, we have a responsibility to deliver accurate, production-grade, state-of-the-art NLP software,” said David Talby, CTO, John Snow Labs. “With the pace of technology and business evolution, last year’s best-of-breed AI tools are already falling behind. Our promise to our customers and the open source community is that we will always keep them state-of-the-art—and this new release delivers on that promise.”
To hear more about the latest Spark NLP product enhancements, industry trends, and innovative use cases, join us for the third annual NLP Summit, taking place October 4-6 online. The free, three day event includes immersive, industry-focused content including over 50 technical sessions, with focus days on open source, healthcare, and finance. Connect with speakers, network with peers, and access all content on-demand after the event.