Rule-based annotations in the Annotation Lab

20.12.2021

Nabin Khadka

Data Scientist at John Snow Labs

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

The Annotation Lab includes support for automated rule-based annotation – to enable programmatic labeling when training NLP models. You can seamlessly combine rule-based and model-based automated annotations, as well as share and reuse rules across your team.

Rule-Based Annotations

Spark NLP for Healthcare supports rule-based annotations via the ContextualParser Annotator. In this release, Annotation NLP lab adds support for creating and using ContextualParser rules in the NER project.

Any user with admin privileges can see and edit the available rules under the Available Rules tab on the Models Hub page. Users can create new rules using the + Add Rules button.

Rule-based annotations in the Annotation Lab

There are two types of rules supported:

Regex Based: Users can define a regex that will be used to label all possible hit chunks and label them as being the target entity. For example, for labeling height entities the following regex can be used “[0-7]'((0?[0-9])|(1(0|1)))”. All hits found in the task text that match the regex, are pre-annotated as heights.
Dictionary Based: Users can define and upload a CSV dictionary of keywords that cover the list of chunks that should be annotated as a target entity. For example, for the label female: woman, lady, girl, all occurrences of stings woman, lady, and girl within the text of a given task will be preannotated as female.

After adding a rule on the Models Hub page, the Project Owner or Manager can add the rule to the configuration of the project where he wants to use it. This can be done via the Rules tab from the Project Setup page under the Project Configuration tab. A valid Spark NLP for Healthcare license is required to deploy rules from project config.

The rules can be used by themselves or in combination with NER model(s). After rule(s) deployment, the Project Owner or Manager then selects one or more tasks from the Tasks page and preannotate them by pressing the Preannotation button.

Get & install it here.

Full feature set here.

If you work with pdf and other image types, you can try image annotation tool as well

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Nabin Khadka

Data Scientist at John Snow Labs

Our additional expert:

Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.

Enterprise-Scale Data Labeling & Automated Model Training with the Free Annotation Lab

Nabin Khadka

Annotation Lab is now the NLP Lab - the Free No-Code AI by John Snow LabsExtracting data from unstructured documents is a...

Rule-based annotations in the Annotation Lab

Rule-Based Annotations

Get & install it here.

Full feature set here.

Enterprise-Scale Data Labeling & Automated Model Training with the Free Annotation Lab

Recommended For You