Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.

Visual NER Automated Preannotation and Training in the Annotation Lab

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here

Annotation Lab v3.4.0 brings support for Visual NER Automated Preannotation and Model Training. Spark NLP and Spark Natural language processing for Healthcare libraries are upgraded to version 4.0. Known security and bug fixes are also included in this release. Here are the highlights:

Visual NER Training

Version 3.4.0 of the Annotation Lab offers the ability to train Visual NER models, apply active learning for automatic model training, and preannotate image-based tasks with existing models in order to accelerate annotation work.

License Requirements

Visual NER annotation, training and preannotation features are dependent on the presence of a Spark OCR license. Floating or airgap licenses with scope ocr: inference and ocr: training are required for preannotation and training respectively.

Dashboard displaying presence of a Spark OCR license

Model Training

The training feature for Visual NER projects can be activated from the Setup page via the “Train Now” button (See 1). From the Training Settings sections, users can tune the training parameters (e.g. Epoch, Batch) and choose the tasks to use for training the Visual NER model (See 3) .

Information on the training progress is shown in the top right corner of the Model Training tab (See 2). Users can check detailed information regarding the success or failure of the last training.

Training Failure can occur because of:

  • Insufficient number of completions
  • Poor quality of completions
  • Insufficient CPU and Memory
  • Wrong training parameters
ZTrain now button and Model Training tab in Visual NER projects

When triggering the training, users can choose to immediately deploy the model or just train it without deploying. If immediate deployment is chosen, then the labeling config is updated with references to the new model so that it will be used for preannotations.

Project Configuration for Visual NER model

Training Server Specification

The minimal required training configuration is 64 GB RAM, 16 Core CPU for Visual NER Training.

Visual NER Preannotation

For running preannotation on one or several tasks, the Project Owner or the Manager must select the target tasks and can click on the Preannotate button from the upper right side of the Tasks Page. This will display a popup with information regarding the last deployment including the list of models deployed and the labels they predict.

Visual NER Preannotation labels

Known Limitations:

  • When bulk preannotation is run on a lot of tasks, the preannotation can fail due to memory issue.
  • Preannotation currently works at token level, and does not merge all tokens of a chunk into one entity.

Preannotation Server Specification

The minimal required training configuration is 16 GB RAM, 4 Core CPU for Visual NER Model.

Spark NLP and Spark NLP for Healthcare upgrades

Annotation Lab 3.4.0 uses Spark NLP 4.0.0, Spark NLP for Healthcare 4.0.2 and Spark OCR 3.13.0. With this upgrade, user can see many new models in the Models Hub.

Confusion Matrix for Classification Projects

A checkbox is now added on the training page to enable the generation of confusion matrix for classification projects. The confusion matrix is visible in the live training logs as well as in the downloaded training logs.

Confusion Matrix for Classification Projects
training logs for confusion matrix for classification projects


Project Import Improvements

Until now, the imported project’s name was set based on the previously exported project name. From this version, the name of the imported project is set according to the name of the imported zip file, which makes it easier for the user to change the project name before importing the project if necessary. In addition, a user can now make changes in the content of the exported zip and then zip it back for import into Annotation Lab.

example of how to exported and import zips into Annotation Lab.

Task Pagination in the Labeling page

In earlier versions of the Annotation Lab, a task from a text-based project was paginated based on predefined or custom number of words. This was problematic, especially when tasks were imported after OCR, and included special characters. From this version on, tasks are paginated based on the number of characters they contain.

Confidence filter slider visible only for preannotation

Previously, the confidence filter was applied to both predictions and completions. Since all manual annotations have a confidence score of 1, we decided to only show and apply the confidence filter when the prediction widget is selected.

Confidence filter for preannotation

Swagger Docs Changes

The API docs have been restructured for an easier use and new methods have been added to mirror the new functionalities offered via the UI.

Swagger Docs Changes

Confidence score for Rules preannotations

The confidence of rule-based preannotations is now visible on the Labeling screen, the same as that of model-based preannotation.

Get & install it HERE.

Full feature set HERE.

Confidence score for Predictions, Benchmark Data Available in ModelsHub page and IAA for Visual NER Projects in the Annotation Lab

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here Annotation Lab v3.3.0 brings...