Building Visual Document Classification Models in the No-Code Generative AI Lab

01.07.2024

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Classifying PDF documents using text-based classification models is a powerful capability Generative AI Lab provides. Users can now pre-annotate and classify images and PDF documents with over 1500 pre-trained models available in the shared repository – Models Hub.

Manual classification is also supported, allowing users to classify documents themselves. This enables document classification in its original form, preserving the integrity of PDFs without converting them to plain text.

New Project Type- Visual NLP Classification:

Using this feature is easy: select the project type called “Visual NLP Classification” from the exiting project template collection of Generative AI Lab and start configuring it. Here are the steps:

Go to the “Content Type” page.
Select the “Image” tab.
Choose “Visual NLP Classification” as the project type.

Visual Classification with Generative AI Lab

Users can classify both images and PDFs using their original form. This means working with the complete/original document, preserving its layout and content, rather than just classifying extracted text. Classification is easy and the workflow and user interface are consistent with previous implementations:

1. Pre-annotation using Classification Models:

After selecting the project type, go to the “Reuse Resource” page.
Choose a classification model from the available pre-trained models.
Save the configuration.
Import OCR Documents
Once the tasks are imported, click on the pre-annotate button to classify tasks based on classification models.

2. Manual Classification:

After selecting the project type, go to the “Customize Labels” page.
Click on the Choices tab and Add/Remove choices for classification
Click on “Code” view and change the choice property in Choice tag to multiple to enable multiple classification.
Save the configuration.
Import OCR Documents
Open the tasks and classify them manually.

Getting Started is Easy

Generative AI Lab is a text annotation tool that can be deployed in a couple of clicks using either Amazon or Azure cloud providers, or installed on-premise with a one-line Kubernetes script.

Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Dia Trambitas, Ph.D.

Ph.D. in Computer Science – Head of Product

Our additional expert:

Dia Trambitas is an AI Product Manager with deep expertise in Natural Language Processing and applied Generative AI. At John Snow Labs, Dia has led the development of the Generative AI Lab — a no-code platform for data annotation and model training — as well as the Medical Chatbot, a secure and domain-specific conversational AI assistant tailored for clinical environments. With a strong focus on practical deployments of cutting-edge AI, she has worked at the intersection of healthcare and technology, driving product innovation that empowers users to harness large language models safely and effectively. Passionate about transforming unstructured data into actionable insights, Dia brings a strategic and user-centered approach to building AI tools that are both powerful and accessible.