The New Open-Source Python NLP Library Enables Data Scientists to Automatically Generate & Run Tests and Deploy Reliable, Safe and Effective Models with Confidence
John Snow Labs, the healthcare AI and NLP company and developer of the Spark NLP library, today announced the release of LangTest, an open-source Python software library that enables data scientists to more easily deliver reliable, safe and effective models.
The need for safer, more equitable, and robust AI models is clear, but there are few tools available to help data scientists achieve this. As a result, current Natural Language Processing models in production are not living up to their promise. Instead, some of the best-known models fail on important aspects like leaking personally identifiable information, reversing their answer due to typos or capitalization changes, to showing biases around race, gender, physical appearance, disability, and religion. These issues are prevalent in some of the most popular state-of-the-art models in use today.
Rigorous and frequent testing is the antidote, and good tests should be specific, comprehensive, and easy to maintain. Additionally, they should be versioned and executable, to make them part of an automated build or MLOps workflow. John Snow Labs’ LangTest Library offers a simple framework to make this simpler. It does this in several ways: it is open source, lightweight, extensible, includes support of multiple libraries, and offers a comprehensive testing strategy for both models and data.
The LangTest library can automatically generate and run 50+ test types out-of-the-box, covering accuracy, fairness, bias, representation, and robustness. Multiple NLP tasks can be tested across 3 of the most popular open-source NLP libraries: Spark NLP, transformers, and spacy. The LangTest library also provides automated data augmentation, which in some cases can automatically improve failing models, especially for issues around robustness and fairness.
The news comes on the first day of the company’s annual Healthcare NLP Summit, a free, virtual event focused on NLP applications in healthcare and life sciences, and will be explored in a keynote session titled, “Introducing the Open-Source Testing Library for NLP Models.”
“Despite their hype, many AI systems simply don’t work. It is time to set higher standards for engineering AI systems: They should work reliably, and you should be able to prove this to yourself, your customers, and your regulator,” said David Talby, CTO, John Snow Labs. “The LangTest library provides the open-source community with a free, production-grade resource that lets data scientists apply best practices for Responsible AI, and embodies a lot of what we’ve learned over the years in delivering regulatory-grade NLP systems.”
LangTest Library is now live and freely available. To get started, visit https://www.johnsnowlabs.com/langtest/. With a full development team allocated to the project, John Snow Labs is committed to improving the library with frequent releases of new test types, tasks, languages, and platforms.