Annotation Lab v3.4.0 brings support for Visual NER Automated Preannotation and Model Training. Spark NLP and Spark Natural language processing for Healthcare libraries are upgraded to version 4.0. Known security and bug fixes are also included in this release. Here are the highlights:
Visual NER Training
Version 3.4.0 of the Annotation Lab offers the ability to train Visual NER models, apply active learning for automatic model training, and preannotate image-based tasks with existing models in order to accelerate annotation work.
Visual NER annotation, training and preannotation features are dependent on the presence of a Spark OCR license. Floating or airgap licenses with scope ocr: inference and ocr: training are required for preannotation and training respectively.
The training feature for Visual NER projects can be activated from the Setup page via the “Train Now” button (See 1). From the Training Settings sections, users can tune the training parameters (e.g. Epoch, Batch) and choose the tasks to use for training the Visual NER model (See 3) .
Information on the training progress is shown in the top right corner of the Model Training tab (See 2). Users can check detailed information regarding the success or failure of the last training.
Training Failure can occur because of:
- Insufficient number of completions
- Poor quality of completions
- Insufficient CPU and Memory
- Wrong training parameters
When triggering the training, users can choose to immediately deploy the model or just train it without deploying. If immediate deployment is chosen, then the labeling config is updated with references to the new model so that it will be used for preannotations.
Training Server Specification
The minimal required training configuration is 64 GB RAM, 16 Core CPU for Visual NER Training.
Visual NER Preannotation
For running preannotation on one or several tasks, the Project Owner or the Manager must select the target tasks and can click on the Preannotate button from the upper right side of the Tasks Page. This will display a popup with information regarding the last deployment including the list of models deployed and the labels they predict.
- When bulk preannotation is run on a lot of tasks, the preannotation can fail due to memory issue.
- Preannotation currently works at token level, and does not merge all tokens of a chunk into one entity.
Preannotation Server Specification
The minimal required training configuration is 16 GB RAM, 4 Core CPU for Visual NER Model.
Spark NLP and Spark NLP for Healthcare upgrades
Annotation Lab 3.4.0 uses Spark NLP 4.0.0, Spark NLP for Healthcare 4.0.2 and Spark OCR 3.13.0. With this upgrade, user can see many new models in the Models Hub.
Confusion Matrix for Classification Projects
A checkbox is now added on the training page to enable the generation of confusion matrix for classification projects. The confusion matrix is visible in the live training logs as well as in the downloaded training logs.
Project Import Improvements
Until now, the imported project’s name was set based on the previously exported project name. From this version, the name of the imported project is set according to the name of the imported zip file, which makes it easier for the user to change the project name before importing the project if necessary. In addition, a user can now make changes in the content of the exported zip and then zip it back for import into Annotation Lab.
Task Pagination in the Labeling page
In earlier versions of the Annotation Lab, a task from a text-based project was paginated based on predefined or custom number of words. This was problematic, especially when tasks were imported after OCR, and included special characters. From this version on, tasks are paginated based on the number of characters they contain.
Confidence filter slider visible only for preannotation
Previously, the confidence filter was applied to both predictions and completions. Since all manual annotations have a confidence score of 1, we decided to only show and apply the confidence filter when the prediction widget is selected.
Swagger Docs Changes
The API docs have been restructured for an easier use and new methods have been added to mirror the new functionalities offered via the UI.
Confidence score for Rules preannotations
The confidence of rule-based preannotations is now visible on the Labeling screen, the same as that of model-based preannotation.