The Annotation Lab includes support for automated rule-based annotation – to enable programmatic labeling when training NLP models. You can seamlessly combine rule-based and model-based automated annotations, as well as share and reuse rules across your team.
Spark NLP for Healthcare supports rule-based annotations via the ContextualParser Annotator. In this release, Annotation NLP lab adds support for creating and using ContextualParser rules in the NER project.
Any user with admin privileges can see and edit the available rules under the Available Rules tab on the Models Hub page. Users can create new rules using the + Add Rules button.
There are two types of rules supported:
- Regex Based: Users can define a regex that will be used to label all possible hit chunks and label them as being the target entity. For example, for labeling height entities the following regex can be used “[0-7]'((0?[0-9])|(1(0|1)))”. All hits found in the task text that match the regex, are pre-annotated as heights.
- Dictionary Based: Users can define and upload a CSV dictionary of keywords that cover the list of chunks that should be annotated as a target entity. For example, for the label female: woman, lady, girl, all occurrences of stings woman, lady, and girl within the text of a given task will be preannotated as female.
After adding a rule on the Models Hub page, the Project Owner or Manager can add the rule to the configuration of the project where he wants to use it. This can be done via the Rules tab from the Project Setup page under the Project Configuration tab. A valid Spark NLP for Healthcare license is required to deploy rules from project config.
The rules can be used by themselves or in combination with NER model(s). After rule(s) deployment, the Project Owner or Manager then selects one or more tasks from the Tasks page and preannotate them by pressing the Preannotation button.
If you work with pdf and other image types, you can try image annotation tool as well