Visual NER Automated Preannotation and Training in the Annotation Lab

01.08.2022

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

Annotation Lab v3.4.0 brings support for Visual NER Automated Preannotation and Model Training. Spark NLP and Spark Natural language processing for Healthcare libraries are upgraded to version 4.0. Known security and bug fixes are also included in this release. Here are the highlights:

Visual NER Training

Version 3.4.0 of the Annotation Lab offers the ability to train Visual NER models, apply active learning for automatic model training, and preannotate image-based tasks with existing models in order to accelerate annotation work.

License Requirements

Visual NER annotation, training and preannotation features are dependent on the presence of a Spark OCR license. Floating or airgap licenses with scope ocr: inference and ocr: training are required for preannotation and training respectively.

Dashboard displaying presence of a Spark OCR license

Model Training

The training feature for Visual NER projects can be activated from the Setup page via the “Train Now” button (See 1). From the Training Settings sections, users can tune the training parameters (e.g. Epoch, Batch) and choose the tasks to use for training the Visual NER model (See 3) .

Information on the training progress is shown in the top right corner of the Model Training tab (See 2). Users can check detailed information regarding the success or failure of the last training.

Training Failure can occur because of:

Insufficient number of completions
Poor quality of completions
Insufficient CPU and Memory
Wrong training parameters

ZTrain now button and Model Training tab in Visual NER projects

When triggering the training, users can choose to immediately deploy the model or just train it without deploying. If immediate deployment is chosen, then the labeling config is updated with references to the new model so that it will be used for preannotations.

Project Configuration for Visual NER model

Training Server Specification

The minimal required training configuration is 64 GB RAM, 16 Core CPU for Visual NER Training.

Visual NER Preannotation

For running preannotation on one or several tasks, the Project Owner or the Manager must select the target tasks and can click on the Preannotate button from the upper right side of the Tasks Page. This will display a popup with information regarding the last deployment including the list of models deployed and the labels they predict.

Known Limitations:

When bulk preannotation is run on a lot of tasks, the preannotation can fail due to memory issue.
Preannotation currently works at token level, and does not merge all tokens of a chunk into one entity.

Preannotation Server Specification

The minimal required training configuration is 16 GB RAM, 4 Core CPU for Visual NER Model.

Spark NLP and Spark NLP for Healthcare upgrades

Annotation Lab 3.4.0 uses Spark NLP 4.0.0, Spark NLP for Healthcare 4.0.2 and Spark OCR 3.13.0. With this upgrade, user can see many new models in the Models Hub.

Confusion Matrix for Classification Projects

A checkbox is now added on the training page to enable the generation of confusion matrix for classification projects. The confusion matrix is visible in the live training logs as well as in the downloaded training logs.

training logs for confusion matrix for classification projects

Miscellaneous

Project Import Improvements

Until now, the imported project’s name was set based on the previously exported project name. From this version, the name of the imported project is set according to the name of the imported zip file, which makes it easier for the user to change the project name before importing the project if necessary. In addition, a user can now make changes in the content of the exported zip and then zip it back for import into Annotation Lab.

example of how to exported and import zips into Annotation Lab.

Task Pagination in the Labeling page

In earlier versions of the Annotation Lab, a task from a text-based project was paginated based on predefined or custom number of words. This was problematic, especially when tasks were imported after OCR, and included special characters. From this version on, tasks are paginated based on the number of characters they contain.

Confidence filter slider visible only for preannotation

Previously, the confidence filter was applied to both predictions and completions. Since all manual annotations have a confidence score of 1, we decided to only show and apply the confidence filter when the prediction widget is selected.

Swagger Docs Changes

The API docs have been restructured for an easier use and new methods have been added to mirror the new functionalities offered via the UI.

Confidence score for Rules preannotations

The confidence of rule-based preannotations is now visible on the Labeling screen, the same as that of model-based preannotation.

Get & install it HERE.

Full feature set HERE.

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.

Confidence score for Predictions, Benchmark Data Available in ModelsHub page and IAA for Visual NER Projects in the Annotation Lab

Dia Trambitas, Ph.D.

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ Annotation Lab v3.3.0 brings...

Visual NER Automated Preannotation and Training in the Annotation Lab

Visual NER Training

License Requirements

Model Training

Training Server Specification

Visual NER Preannotation

Preannotation Server Specification

Spark NLP and Spark NLP for Healthcare upgrades

Confusion Matrix for Classification Projects

Miscellaneous

Project Import Improvements

Task Pagination in the Labeling page

Confidence filter slider visible only for preannotation

Swagger Docs Changes

Confidence score for Rules preannotations

Get & install it HERE.

Full feature set HERE.

Confidence score for Predictions, Benchmark Data Available in ModelsHub page and IAA for Visual NER Projects in the Annotation Lab

Recommended For You