Home » Sign Up Webinar: Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

May 28th, 2025 @ 2:00 PM ET

This webinar introduces the recently released Spark NLP 6.0, an Apache 2.0 licensed open-source Python library which enables you to analyze large amounts of multi-modal data for batch LLM inference or to prepare data for RAG & LLM solutions – privately, efficiently, and at no cost. The library can operate on a single machine or container, or scale natively on any Spark hardware without code changes. Spark NLP recently crossed 150M downloads and this new release adds supports 3 major new use cases:

Support for ingesting and pre-processing PDF, Excel, PowerPoint, text and image files. Prepare, analyze, and ingest all files formats into a LLM / RAG solution using one unified pipeline.
Visual language models! Multiple VLMs of different sizes & features are natively available as steps in processing pipelines, enabling you to extract facts and answers from images and visual PDF files.
Extract structure, semantics, and metadata from unstructured and visual data in all file formats – using batch inference at scale.

Join to learn how to apply these new capabilities by walking through Python notebooks showcasing end-to-end scenarios.

Maziyar Panahi

Principal AI Engineer & Team Lead
at John Snow Labs

Maziyar Panahi is a Principal AI / ML engineer and a senior Team Lead with over a decade-long experience in public research. He leads a team behind Spark NLP at John Snow Labs, one of the most widely used NLP libraries in the enterprise.

He develops scalable NLP components using the latest techniques in deep learning and machine learning that includes classic ML, Language Models, Speech Recognition, and Computer Vision. He is an expert in designing, deploying, and maintaining ML and DL models in the JVM ecosystem and distributed computing engine (Apache Spark) at the production level.

Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

Reserve Your Spot

Join the Global Healthcare AI Community