Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

John Snow Labs Puts Responsible AI to Practice with the Release of the LangTest Library

The New Open-Source Python NLP Library Enables Data Scientists to Automatically Generate & Run Tests and Deploy Reliable, Safe and Effective Models with Confidence

John Snow Labs, the healthcare AI and NLP company and developer of the Spark NLP library, today announced the release of LangTest, an open-source Python software library that enables data scientists to more easily deliver  reliable, safe and effective models.

The need for safer, more equitable, and robust AI models is clear, but there are few tools available to help data scientists achieve this. As a result, current Natural Language Processing models in production are not living up to their promise. Instead, some of the best-known models fail on important aspects like leaking personally identifiable information, reversing their answer due to typos or capitalization changes,  to showing biases around race, gender, physical appearance, disability, and religion. These issues are prevalent in some of the most popular state-of-the-art models in use today.

Rigorous and frequent testing is the antidote, and good tests should be specific, comprehensive, and easy to maintain. Additionally, they should be versioned and executable, to make them part of an automated build or MLOps workflow. John Snow Labs’ LangTest Library offers a simple framework to make this simpler. It does this in several ways: it is open source, lightweight, extensible, includes support of multiple libraries, and offers a comprehensive testing strategy for both models and data.

The LangTest library can automatically generate and run 50+ test types out-of-the-box,  covering accuracy, fairness, bias, representation, and robustness. Multiple NLP tasks can be tested across 3 of the most popular open-source NLP libraries: Spark NLP, transformers, and spacy. The LangTest library also provides automated data augmentation, which in some cases can automatically improve failing models, especially for issues around robustness and fairness.

The news comes on the first day of the company’s annual Healthcare NLP Summit, a free, virtual event focused on NLP applications in healthcare and life sciences, and will be explored in a keynote session titled, “Introducing the Open-Source Testing Library for NLP Models.”

“Despite their hype, many AI systems simply don’t work. It is time to set higher standards for engineering AI systems: They should work reliably, and you should be able to prove this to yourself, your customers, and your regulator,” said David Talby, CTO, John Snow Labs. “The LangTest library provides the open-source community with a free, production-grade resource that lets data scientists apply best practices for Responsible AI, and embodies a lot of what we’ve learned over the years in delivering regulatory-grade NLP systems.”

LangTest Library is now live and freely available. To get started, visit With a full development team allocated to the project, John Snow Labs is committed to improving the library with frequent releases of new test types, tasks, languages, and platforms.

Try Healthcare NLP

See in action

Introducing the Open-Source Library for Testing NLP Models

While there’s a lot of work done on defining guidelines and policies for Responsible AI, there are far fewer that data scientists...