Configuring Automated Backups and Model Training Resources in the Annotation Lab

01.04.2022

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

Two new tabs have been added to the Settings page to ease the infrastructure definition for the prediction and training tasks and for defining backup schedules.

Resource allocation for Training and Preannotation

Since release 2.8.0, Annotation Lab gives users the ability to change the configuration for the training and preannotation processes. This is done from the Settings page > Infrastructure tab. The settings can be edited by admin users and they are read-only for the other users. The Infrastructure tab consists of three sections named Training Resources, Prenotation Server Resources, Prenotation Pipeline Resources.

Resources Inclusion:

Memory Limit – Represents the maximum memory size to allocate for the training/preannotation processes.
CPU Limit – Specifies this maximum number of CPUs to use by the training/preannotation server.
Spark Drive Memory – Defines the memory allocated for the Spark driver.
Spark Kry Buff Max – Specifies the maximum memory size to allocate for the Kryo serialization buffer.
Spark Driver Maximum Result Size – Represents the total size of the serialized results of all the partitions for spark.

NOTE: If the specified configurations exceed the available resources, the server will not start.

Backup settings in UI

In this release, AnnotationLab adds support for defining database and files backups via the UI. Any user with the admin role can view and edit the backup settings under the Settings tab. Users can select different backup periods and can specify a target S3 bucket for storing the backup files. New backups will be automatically generated and saved to the S3 bucket following the defined schedule.

Stay tuned for more exciting features!

How useful was this post?

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.

Annotating Multi-Page Documents Efficiently with Dynamic Pagination and Cross-Page Annotation

Dia Trambitas, Ph.D.

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ Dynamic pagination The support...

Configuring Automated Backups and Model Training Resources in the Annotation Lab

Resource allocation for Training and Preannotation

Backup settings in UI

Stay tuned for more exciting features!

Annotating Multi-Page Documents Efficiently with Dynamic Pagination and Cross-Page Annotation

Recommended For You