Searching for specific information within long text or PDF documents, or within images is important because it allows users to quickly and easily locate the information they need without having to manually scroll through the entire document. This can save time and make it more efficient for users to find the information they need. Additionally, PDFs often contain large amounts of information and can be difficult to navigate, so search functionality can help users easily find the information they are looking for within the document. If the goal is to extract data from PDF, Visual NLP tool is suitable.
Task Search by Text, Label, and Choice
NLP Labs offer advanced search features that help users identify the tasks they need based on the text or based on the annotations defined so far. Currently, supported search queries are:
- text: patient -> returns all tasks which contain the string “patient”;
- label: ABC -> returns all tasks that have at least one completion containing a chunk with label ABC;
- label: ABC=DEF -> returns all tasks that have at least one completion containing the text DEF labeled as ABC;
- choice: Sport -> returns all tasks that have at least one completion which classified the task as Sport;
- choice: Sport, Politics -> returns all tasks that have at least one completion containing multiple choices Sport and Politics.
Search functionality is case insensitive, thus the following queries
label: ABC=DEF ,
label: Abc=Def or
label: abc=def are considered equivalent.
Keyword-based Search at Task Level
NLP Lab supports task-level keyword-based searches. The keyword-based search feature works for text and Visual NER projects alike.
- The search will work on all paginated pages.
- It is also possible to navigate between search results, even if that result is located on another page.
In the NLP Annotation Lab, the search feature was implemented with the help of an HTML tag, added to the Visual NER project configuration. In the NLP Lab, with the implementation of task-level search feature, the previous search tag should be removed from existing visual NER projects.
Config to be removed from all existing Visual NER projects:
<Search name="search" toName="image" placeholder="Search"/>
Keyword-based search in text tasks.
Keyword-based search in PDF/image tasks.
Chunk-based Search in Visual NER Tasks
In previous versions, users could only run token-based searches at page level. The search feature did not support searching a collection of tokens as a single chunk. With this release, users can find a chunk of tokens in the Visual NER task.
Getting Started is Easy
The NLP Lab is a free tool that can be deployed in a couple of clicks on the AWS and Azure Marketplaces, or installed on-premise with a one-line Kubernetes script. Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install