NLP Lab 5.5 comes with support for FREE rule-based NER annotation, handy for cases where context can be ignored in pre-annotation scenarios (e.g. identification of email addresses, telephone numbers, list of cities, countries, etc.).
This feature enhances the capability of John Snow Labs product by allowing users to define and apply custom rules for pre-annotation, which can significantly accelerate the data annotation process.
Regex-Based: Users can define a regex that will be used to identify all possible hit chunks and label them as being the target entity. For example, for labeling height entities the following regex can be used:
All hits found in the task text that match the regex, are pre-annotated as heights.
Dictionary-Based: Users can define and upload a CSV dictionary of keywords that cover the list of tokens that should be annotated as a target entity. For example, for the label female: woman, lady, girl, all occurrences of strings woman, lady, and girl within the text of a given task will be pre-annotated as female. Make sure to include all possible forms of the target entity in the dictionary as only values in the list will be pre-annotated.
The Free version of rules operates without support for contextual understanding. They are designed to recognize and annotate text based solely on the defined regex patterns or dictionary entries. While effective for many scenarios, users should note that without contextual support, there may be limitations in handling ambiguous cases where context is crucial for accurate annotation. Contextual rules are available in the presence of a healthcare license key. Any admin user can see and edit the rules under the Available Rules tab on the Models Hub page. Users can create new rules using the + Add Rules button.
After creating a rule on the Models Hub page, the Project Owner or Manager can incorporate the rule into the project’s configuration where it is intended for use. This can be accomplished through the ‘Rules’ tab on the Project Setup page, located within the ‘Project Configuration’ tab. Additionally, the newly created rules can be tested in the playground.
- Free rules do not have access to context information like “Prefix Keyword,” “Suffix Keyword,” and Rule/Match Scope.
- If contextual rules are used within active projects in the presence of a healthcare license key and the latter expires, the rules can remain operational but will no longer take context information into account.
Integrating these rules into project workflows is streamlined and user-friendly, allowing for easy testing, modification, and application. This ensures that NLP Lab continues to be a flexible and powerful tool for a wide range of NLP tasks.
The introduction of free rule-based NER annotation offers versatility and efficiency to users across various domains. While they are highly effective for straightforward cases, such as identifying email addresses or specific lists of words, their application might be limited in scenarios where the context plays a crucial role in interpretation. For more nuanced and context-dependent cases, especially in specialized fields like healthcare, the availability of contextual rules with a healthcare license key offers an additional layer of precision and adaptability.
As we continue to enhance NLP Lab, we remain committed to providing tools that are not only technologically advanced but also user-centric, catering to the evolving needs of our diverse user base. We are excited to see how our community will utilize these new features in NLP Lab 5.5 to drive innovation and efficiency in their NLP endeavors.
Getting Started is Easy
The NLP Lab is a free text annotation tool that can be deployed in a couple of clicks on the AWS, Azure, or OCI Marketplaces or installed on-premise with a one-line Kubernetes script.
Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install
Start your journey with NLP Lab and experience the future of data analysis and model training today!