Meet our team at BioTechX Europe in Basel on the 9-10 October 2024, booth 724. Schedule a meeting with our team HERE.
was successfully added to your cart.

    Visual NER Automated Preannotation and Training in the Annotation Lab

    Avatar photo
    Ph.D. in Computer Science – Head of Product

    A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

    Annotation Lab v3.4.0 brings support for Visual NER Automated Preannotation and Model Training. Spark NLP and Spark Natural language processing for Healthcare libraries are upgraded to version 4.0. Known security and bug fixes are also included in this release. Here are the highlights:

    Visual NER Training

    Version 3.4.0 of the Annotation Lab offers the ability to train Visual NER models, apply active learning for automatic model training, and preannotate image-based tasks with existing models in order to accelerate annotation work.

    License Requirements

    Visual NER annotation, training and preannotation features are dependent on the presence of a Spark OCR license. Floating or airgap licenses with scope ocr: inference and ocr: training are required for preannotation and training respectively.

    Dashboard displaying presence of a Spark OCR license

    Model Training

    The training feature for Visual NER projects can be activated from the Setup page via the “Train Now” button (See 1). From the Training Settings sections, users can tune the training parameters (e.g. Epoch, Batch) and choose the tasks to use for training the Visual NER model (See 3) .

    Information on the training progress is shown in the top right corner of the Model Training tab (See 2). Users can check detailed information regarding the success or failure of the last training.

    Training Failure can occur because of:

    • Insufficient number of completions
    • Poor quality of completions
    • Insufficient CPU and Memory
    • Wrong training parameters
    ZTrain now button and Model Training tab in Visual NER projects

    When triggering the training, users can choose to immediately deploy the model or just train it without deploying. If immediate deployment is chosen, then the labeling config is updated with references to the new model so that it will be used for preannotations.

    Project Configuration for Visual NER model

    Training Server Specification

    The minimal required training configuration is 64 GB RAM, 16 Core CPU for Visual NER Training.

    Visual NER Preannotation

    For running preannotation on one or several tasks, the Project Owner or the Manager must select the target tasks and can click on the Preannotate button from the upper right side of the Tasks Page. This will display a popup with information regarding the last deployment including the list of models deployed and the labels they predict.

    Visual NER Preannotation labels

    Known Limitations:

    • When bulk preannotation is run on a lot of tasks, the preannotation can fail due to memory issue.
    • Preannotation currently works at token level, and does not merge all tokens of a chunk into one entity.

    Preannotation Server Specification

    The minimal required training configuration is 16 GB RAM, 4 Core CPU for Visual NER Model.

    Spark NLP and Spark NLP for Healthcare upgrades

    Annotation Lab 3.4.0 uses Spark NLP 4.0.0, Spark NLP for Healthcare 4.0.2 and Spark OCR 3.13.0. With this upgrade, user can see many new models in the Models Hub.

    Confusion Matrix for Classification Projects

    A checkbox is now added on the training page to enable the generation of confusion matrix for classification projects. The confusion matrix is visible in the live training logs as well as in the downloaded training logs.

    Confusion Matrix for Classification Projects
    training logs for confusion matrix for classification projects

    Miscellaneous

    Project Import Improvements

    Until now, the imported project’s name was set based on the previously exported project name. From this version, the name of the imported project is set according to the name of the imported zip file, which makes it easier for the user to change the project name before importing the project if necessary. In addition, a user can now make changes in the content of the exported zip and then zip it back for import into Annotation Lab.

    example of how to exported and import zips into Annotation Lab.

    Task Pagination in the Labeling page

    In earlier versions of the Annotation Lab, a task from a text-based project was paginated based on predefined or custom number of words. This was problematic, especially when tasks were imported after OCR, and included special characters. From this version on, tasks are paginated based on the number of characters they contain.

    Confidence filter slider visible only for preannotation

    Previously, the confidence filter was applied to both predictions and completions. Since all manual annotations have a confidence score of 1, we decided to only show and apply the confidence filter when the prediction widget is selected.

    Confidence filter for preannotation

    Swagger Docs Changes

    The API docs have been restructured for an easier use and new methods have been added to mirror the new functionalities offered via the UI.

    Swagger Docs Changes

    Confidence score for Rules preannotations

    The confidence of rule-based preannotations is now visible on the Labeling screen, the same as that of model-based preannotation.

    Get & install it HERE.

    Full feature set HERE.

    How useful was this post?

    Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

    See in action
    Avatar photo
    Ph.D. in Computer Science – Head of Product
    Our additional expert:
    Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.

    Confidence score for Predictions, Benchmark Data Available in ModelsHub page and IAA for Visual NER Projects in the Annotation Lab

    A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ Annotation Lab v3.3.0 brings...
    preloader