Support for Multilingual and European Languages in the Annotation Lab

30.08.2022

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

Annotation Lab 3.5.0 adds support for out-of-the-box usage of Multilingual Models as well as support for some of the European Language Models: Romanian, Portuguese, Danish and Italian. It also provides support for split dataset using Test/Train tags in classification project and allows NER pretrained models evaluation with floating license. The release also includes fixes for known security vulnerabilities and for some bugs reported by our user community.

Here are the highlights of this release:

Support for Multilingual Models

In this release, some improvements were added to the Models Hub page to include multilingual models. Previously, only multilingual embeddings were available on the Hub page of NLP Models. A new language filter has been added to the Models hub page to make searching for all available multilingual models and embeddings more efficient. User can select the target language and then explore the set of relevant multilingual models and embeddings. When a relevant model is discovered and downloaded to Annotation Lab, the suffix xx will be added to the model’s name.

Expended Support for European Language Models

Annotation Lab now offers support for four new European languages Romanian, Portuguese, Italian, and Danish, on top of English, Spanish, and German, already supported in previous versions. Many pretrained models in those languages are now available to download from the NLP Models Hub and easily use to preannotate documents on the Annotation Lab. Users can also upload models in those languages to the Annotation Lab and use them for preannotation in new projects or tune them to better suite their data.

Use Test/Train Tags for Classification Training Experiments

With version 3.5.0, the Test/Train split of annotated tasks can be used when training classification models. When this option is checked on the Training Settings, all tasks that have the Test tag are used as test datasets. All tasks tagged as Train together with all other non Test tasks will be used as a training dataset. For instance, when “Split dataset using Test/Train tags” is enabled, on a project with a total of 94 tasks, out of which 46 tasks have the Test tag, the 48 tasks that do not have the Test tag will be used as train tasks while the 46 Test tasks will be used for testing the model’s performance.

NER Model Evaluation available for Floating License

Starting with version 3.5.0, Project Owner and/or Manager can evaluate pretrained NER models against a set of annotated tasks in the presence of floating licenses. Earlier, this feature was only available in the presence of airgap licenses. The NER models included in the current project configuration are tested against the tasks tagged as Test. The performance metrics can be checked and downloaded as evaluation logs.

Chunks preannotation in VisualNER

Annotation Lab 3.4.0 which first published the visual NER preannotation and visual NER model training could only create token level preannotations. With version 3.5.0, individual tokens are combined into one chunk entity and shown as merged to the user.

Benchmarking Information for Models Trained with Annotation Lab

Earlier versions of Annotation Lab provided benchmarking information for models published by John Snow Labs on the NLP Models Hub. With version 3.5.0 benchmarking information is available for models trained within Annotation Lab. User can go to the Available Models Tab of the Models Hub page and view the benchmarking data by clicking the small graph icon next to the model.

Configuration for Annotation Lab Deployment

The resources allocated to Annotation Lab deployment can be configured via the resource values in the annotationlab-updater.sh. Users can follow the steps below to change the default resource allocation:

Go to the end of annotationlab-updater.sh file
Set the CPU and memory for both limits and requests as per the requirement

Example:

--set resources.limits.cpu="8000m"\
        --set resources.limits.memory="8000Mi" \
        --set resources.requests.cpu="4000m"  \
        --set resources.requests.memory="4000Mi" \

NOTE:The resource allocation needs to be defined according to the type of instance used for the deployment.

Security fixes

As part of our release process we run CVE scans and upgrade the python packages to latest versions in order to eliminate known vulnerabilities. A high number of Common Vulnerabilities and Exposures(CVE) issues are fixed in this version.

Get & Install It Here.

Full Feature Set Here.

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.

Improving Annotation Quality using Analytics in the Annotation Lab

Ph.D. Andrei Feier

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ The Analytics feature offered...

Support for Multilingual and European Languages in the Annotation Lab

Support for Multilingual Models

Expended Support for European Language Models

Use Test/Train Tags for Classification Training Experiments

NER Model Evaluation available for Floating License

Chunks preannotation in VisualNER

Benchmarking Information for Models Trained with Annotation Lab

Configuration for Annotation Lab Deployment

Security fixes

Get & Install It Here.

Full Feature Set Here.

Improving Annotation Quality using Analytics in the Annotation Lab

Recommended For You