Support for Free NER Rules in NLP Lab 5.5

05.12.2023

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

NLP Lab 5.5 comes with support for FREE rule-based NER annotation, handy for cases where context can be ignored in pre-annotation scenarios (e.g. identification of email addresses, telephone numbers, list of cities, countries, etc.).

This feature enhances the capability of John Snow Labs product by allowing users to define and apply custom rules for pre-annotation, which can significantly accelerate the data annotation process.

Supported Rules

Regex-Based: Users can define a regex that will be used to identify all possible hit chunks and label them as being the target entity. For example, for labeling height entities the following regex can be used:

[0-7]'((0?[0-9])|(1(0|1)))

All hits found in the task text that match the regex, are pre-annotated as heights.

Dictionary-Based: Users can define and upload a CSV dictionary of keywords that cover the list of tokens that should be annotated as a target entity. For example, for the label female: woman, lady, girl, all occurrences of strings woman, lady, and girl within the text of a given task will be pre-annotated as female. Make sure to include all possible forms of the target entity in the dictionary as only values in the list will be pre-annotated.

Limitations

The Free version of rules operates without support for contextual understanding. They are designed to recognize and annotate text based solely on the defined regex patterns or dictionary entries. While effective for many scenarios, users should note that without contextual support, there may be limitations in handling ambiguous cases where context is crucial for accurate annotation. Contextual rules are available in the presence of a healthcare license key. Any admin user can see and edit the rules under the Available Rules tab on the Models Hub page. Users can create new rules using the + Add Rules button.

After creating a rule on the Models Hub page, the Project Owner or Manager can incorporate the rule into the project’s configuration where it is intended for use. This can be accomplished through the ‘Rules’ tab on the Project Setup page, located within the ‘Project Configuration’ tab. Additionally, the newly created rules can be tested in the playground.

Note:

Free rules do not have access to context information like “Prefix Keyword,” “Suffix Keyword,” and Rule/Match Scope.
If contextual rules are used within active projects in the presence of a healthcare license key and the latter expires, the rules can remain operational but will no longer take context information into account.

Integrating these rules into project workflows is streamlined and user-friendly, allowing for easy testing, modification, and application. This ensures that NLP Lab continues to be a flexible and powerful tool for a wide range of NLP tasks.

Conclusion

The introduction of free rule-based NER annotation offers versatility and efficiency to users across various domains. While they are highly effective for straightforward cases, such as identifying email addresses or specific lists of words, their application might be limited in scenarios where the context plays a crucial role in interpretation. For more nuanced and context-dependent cases, especially in specialized fields like healthcare, the availability of contextual rules with a healthcare license key offers an additional layer of precision and adaptability.

As we continue to enhance NLP Lab, we remain committed to providing tools that are not only technologically advanced but also user-centric, catering to the evolving needs of our diverse user base. We are excited to see how our community will utilize these new features in NLP Lab 5.5 to drive innovation and efficiency in their NLP endeavors.

Getting Started is Easy

The NLP Lab is a free text annotation tool that can be deployed in a couple of clicks on the AWS, Azure, or OCI Marketplaces or installed on-premise with a one-line Kubernetes script.
Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install

Start your journey with NLP Lab and experience the future of data analysis and model training today!

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.

Support for Azure OpenAI Service plus GPT Prompt-Based Classification in NLP Lab 5.5

Dia Trambitas, Ph.D.

We are thrilled to announce the release of NLP Lab version 5.5, marking another significant step forward in its Natural Language Processing...

Support for Free NER Rules in NLP Lab 5.5

Supported Rules

Limitations

Conclusion

Getting Started is Easy

Support for Azure OpenAI Service plus GPT Prompt-Based Classification in NLP Lab 5.5

Recommended For You