Annotation Lab is now the NLP Lab – the Free No-Code AI by John Snow Labs
The development of high-quality Deep Learning NLP models usually requires significant amounts of training data. The models must be taught to correctly differentiate specific entities and make accurate predictions. This is usually done via examples (training data) provided by human users, with good expertise in the target domain. The best and easiest way to put together the example data is via (manual) annotation.
Bottlenecks of text annotation
Managing and streamlining data annotation is not an easy task. It comes with several challenges and obstacles that can seriously affect the success of any AI project. Following are some common data annotation challenges that impact the team productivity and models’ quality:
- AI and ML models are data-hungry and need a significant amount of labeled data to learn from. Thus, businesses struggle to secure and manage a highly specialized workforce for generating labeled data to feed the models.
- For annotating documents and preparing the annotations in the expected format to feed into the training pipelines, specialized tools are necessary to improve productivity and ensure coherence and inter-annotator agreement. Developing such tools from scratch is a highly specialized, effort-intensive, and time-consuming process. So is the maintenance of such a tool.
- For training Deep Learning models, skilled data scientists are needed to check the quality of the annotations, train and tune the models and deploy them in production. Such professionals are hard to find and expensive to retain.
Choosing the right tools
This blog compares some of the most commonly used solutions for Text Annotation available on the market and highlights their major features and limitations.
The tools included in this comparison are:
For choosing the most suitable solution for your particular annotation problem, start by answering the following questions:
- What content do I need to process?
- How do I manage my team and projects?
- How do I keep my data safe?
- How can I automate the annotation process?
- How much am I willing to pay for my text annotation tool?
Supported Content Types
The starting point of any annotation project is to analyze the documents that need to be processed both in terms of content and modality. Are you analyzing text, video or audio content? And what entities do you need to extract/annotate: named entities, relations, bounding boxes, etc.
When comparing the support for different content types, Annotation Lab and Label Studio offer the same level of features in the free versions, while Prodigy includes those in the paid edition. LabelBox is missing support for audio content while LightTag and TagTog do not offer any image, video, or audio annotation features.
Projects and Teams
When working on complex data extraction/validation projects, usually, the work is distributed among a team of domain experts with the role of annotators or reviewers. Such collaboration demands using a software tool for effective project management, including task assignment, tracking, and quality checking.
Among the 6 tools included in the comparison, the largest palette of project management features by far is offered by the Annotation Lab. All those features are included in the community (free) version of the tool.
While the other text annotation tools also cover some important features (e.g support for multiple projects, API access) they are very often included in the paid editions (see the case of LightTag, Prodigy or TagTog). Another example is that of Task Assignments — a mandatory feature when running team-based projects — which is only available in the free versions of the Annotation Lab, LabelBox, and TagTog.
Projects and Teams Features
The situation is very similar when looking at collaboration features such as consensus analysis, feedback and comments features, out-of-the-box review workflows, and performance dashboards. All those functionalities are available in the Annotation Lab for free, while the other tools, if they include the features, those are part of the enterprise/paid editions.
Security and Privacy
When annotating enterprise data you are often faced with the need to handle Personal Identifying Information (PII) and Protected Health Information (PHI) in a secured and privacy-aware setup. This often means you will need to deploy the NLP annotation tool on your own premise and avoid data sharing or SAAS setups.
Among the 6 tools compared here, Annotation Lab is the only one that offers enterprise-grade security and privacy features for free:
- Zero data sharing
- Role-based access
- Full audit trails
- Multi-factor authentication.
LightTag and TagTog are right behind with Enterprise support for the majority of listed features except for annotation versioning. This makes it difficult to run experiments on your projects with different versions of the data.
Security and Privacy Features
Pre-annotation is the process of generating annotations for a set of documents/tasks using an existing model before a human annotator manually completes/corrects/validates them. It results in crucial time savings for annotators as it increases the annotation speed.
This feature is freely available in John Snow Labs’ Annotation Lab platform. Annotation Lab facilitates an end-to-end process from document import, pre-annotation, manual corrections, and manual annotation to model training and testing without writing a line of code. It also offers seamless integration with the NLP Models Hub, from where users can download and reuse hundreds of pre-trained models so they don’t waste time on already learned tasks.
Model-based preannotation is also possible in LabelStudio via third-party ML integrations that need to be setup by users. LabelBox, LightTag, TagTog, and Prodigy only offer this type of automation on the paid versions.
No Code Model Training
If you want to go beyond annotated data and obtain a fully functional, production-ready NLP model, the only platform that allows you to do that without getting Data Scientists involved and without writing a line of code is the Annotation Lab.
Once enough training data is available (e.g. your team annotated at least 40–50 examples for each entity in your taxonomy) you can start training a new model. This can be done from scratch or by tuning an already existing pretrained model.
Annotation Lab also offers active learning features, which trigger the model training automatically in the background when target milestones are reached. It can be configured to run when 50, 100 or 200 new completions are available.
Getting it All for Free
At the time of this comparison, 5 out of the 6 tools offered free versions but 4 of them impose important limitations on the available features.
Tools Editions and Limitations
If you need a flexible and powerful end-to-end platform for document annotation and model training that you can deploy on the cloud or on your premise with enterprise-level security and privacy features and no limitations on the number of projects, tasks, users, models, and pre-annotations you should definitely choose Annotation Lab. This is very suited for both data scientists and domain experts as all features it offers are available via both UI and API.
Annotation Lab can be installed for free via AWS and Azure Marketplaces. You can also install it locally on any ubuntu server by following the instructions detailed here.