Keyword-based search for text, images, and PDFs in the NLP Lab

26.01.2023

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Searching for specific information within long text or PDF documents, or within images is important because it allows users to quickly and easily locate the information they need without having to manually scroll through the entire document. This can save time and make it more efficient for users to find the information they need. Additionally, PDFs often contain large amounts of information and can be difficult to navigate, so search functionality can help users easily find the information they are looking for within the document. If the goal is to extract data from PDF, Visual NLP tool is suitable.

Task Search by Text, Label, and Choice

NLP Labs offer advanced search features that help users identify the tasks they need based on the text or based on the annotations defined so far. Currently, supported search queries are:

text: patient -> returns all tasks which contain the string “patient”;
label: ABC -> returns all tasks that have at least one completion containing a chunk with label ABC;
label: ABC=DEF -> returns all tasks that have at least one completion containing the text DEF labeled as ABC;
choice: Sport -> returns all tasks that have at least one completion which classified the task as Sport;
choice: Sport, Politics -> returns all tasks that have at least one completion containing multiple choices Sport and Politics.

Search functionality is case insensitive, thus the following queries label: ABC=DEF , label: Abc=Def or label: abc=def are considered equivalent.

Keyword-based Search at Task Level

NLP Lab supports task-level keyword-based searches. The keyword-based search feature works for text and Visual NER projects alike.

The search will work on all paginated pages.
It is also possible to navigate between search results, even if that result is located on another page.

Important

In the NLP Annotation Lab, the search feature was implemented with the help of an HTML tag, added to the Visual NER project configuration. In the NLP Lab, with the implementation of task-level search feature, the previous search tag should be removed from existing visual NER projects.

Config to be removed from all existing Visual NER projects:

<Search name="search" toName="image" placeholder="Search"/>

Keyword-based search in text tasks.

Keyword-based search in PDF/image tasks.

Chunk-based Search in Visual NER Tasks

In previous versions, users could only run token-based searches at page level. The search feature did not support searching a collection of tokens as a single chunk. With this release, users can find a chunk of tokens in the Visual NER task.

Getting Started is Easy

The NLP Lab is a free tool that can be deployed in a couple of clicks on the AWS and Azure Marketplaces, or installed on-premise with a one-line Kubernetes script. Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.