Home » Webinars

Webinars

Webinars presented by John Snow Labs

Wednesday, July 23rd @ 2pm ET

Consistent Linking, Tokenization, and Obfuscation for Regulatory-Grade De-Identification of Longitudinal Medical Data

In this webinar, we will delve into the intersection of data science innovation and regulatory compliance for de-identifying patient data across diverse sources and time points. This session focuses on three core capabilities of John Snow Labs’ software:

Consistent Obfuscation: Replace PHI fields—such as patient names, hospital names, and dates—with realistic yet fictitious counterparts in a gender- and context-aware manner. For instance, if “Jane Sunshine” is obfuscated to “Anne Boleyn,” any subsequent “Jane” will deterministically map to “Anne,” preserving referential consistency throughout the dataset.
Deterministic Tokenization: Transform patient identifiers (e.g., MRN or a composite of first name, last name, and birthdate) into cryptographic hashes. This ensures that subsequent records about the same individual—whether weeks or years later—are tokenized to the same value, enabling reliable linkage without exposing identifiable information.
Multimodal Linking: Seamlessly connect de-identified data spanning EHRs, claims, radiology reports (PDF), DICOM images, and free-text clinical notes. By applying consistent obfuscation and tokenization across formats, researchers can reconstruct longitudinal patient journeys while maintaining full compliance with privacy regulations.

John Snow Labs’ de-identification models have been rigorously evaluated in peer-reviewed benchmarks for PHI detection – surpassing Azure Health Data Services, AWS Comprehend Medical, OpenAI’s GPT-4o and GPT-4.5, and Claude Sonnet 3.7. This solution not only exceeds the threshold for regulatory-grade accuracy but also outperforms human experts and general-purpose LLMs, ensuring both compliance with HIPAA/GDPR and the highest standards for research validity. Join us to see how John Snow Labs delivers proven, cost-effective, and scalable de-identification so you can accelerate your data science initiatives under the strictest privacy frameworks.

Youssef Mellah

Youssef Mellah, Ph.D., is a Senior Data Scientist and Machine Learning Engineer at John Snow Labs, specialist with more than 8 years of experience in artificial intelligence, natural language processing, and deep learning. He specializes in building, training, and deploying regulatory-grade ML/DL models and large language models (LLMs) for healthcare and life sciences, including the de-identification and tokenization of multimodal medical data. Youssef has a strong track record designing scalable, privacy-preserving AI solutions that enable compliant research and analytics across structured and unstructured data. He is passionate about advancing NLP technology, leading multidisciplinary teams, and transforming cutting-edge research into practical, real-world applications.

RECORDED ON:

Wednesday, May 28th @ 2pm ET

Watch Now

Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

This webinar introduces the recently released Spark NLP 6.0, an Apache 2.0 licensed open-source Python library which enables you to analyze large amounts of multi-modal data for batch LLM inference or to prepare data for RAG & LLM solutions – privately, efficiently, and at no cost. The library can operate on a single machine or container, or scale natively on any Spark hardware without code changes. Spark NLP recently crossed 150M downloads and this new release adds supports 3 major new use cases:

Support for ingesting and pre-processing PDF, Excel, PowerPoint, text and image files. Prepare, analyze, and ingest all files formats into a LLM / RAG solution using one unified pipeline.”
Visual language models! Multiple VLMs of different sizes & features are natively available as steps in processing pipelines, enabling you to extract facts and answers from images and visual PDF files.
Extract structure, semantics, and metadata from unstructured and visual data in all file formats – using batch inference at scale.

Join to learn how to apply these new capabilities by walking through Python notebooks showcasing end-to-end scenarios.

Maziyar Panahi

Maziyar Panahi is a Principal AI / ML engineer and a senior Team Lead with over a decade-long experience in public research. He leads a team behind Spark NLP at John Snow Labs, one of the most widely used NLP libraries in the enterprise.

He develops scalable NLP components using the latest techniques in deep learning and machine learning that includes classic ML, Language Models, Speech Recognition, and Computer Vision. He is an expert in designing, deploying, and maintaining ML and DL models in the JVM ecosystem and distributed computing engine (Apache Spark) at the production level.

Recorded On:

Wednesday, May 7th @ 2pm ET

Watch Now

Comparing Frontier LLMs on Analyzing Clinical Narratives

This webinar compares the performance OpenAI’s GPT-4.5, Anthropic’s Claude 3.7 Sonnet, and John Snow Labs’ Medical LLM on the five most common tasks related to analyzing clinical notes:

Summarization: “Summarize the patient’s medical history and initial presentation.”
Information extraction: “What procedures did the patient undergo in the past year?”
Question answering: “What biomarkers are commonly negative in APL cases?”
De-identification: “Generate a HIPAA safe harbor anonymized version of this note.”
Clinical coding: “What are the billable ICD-10-CM code for this visit?”

This is done via a blind evaluation by practicing medical doctors. The doctors are asked to decide which answer they prefer on the dimensions of factuality, clinical relevance, and conciseness. The evaluation methodology, including the measurement of inter-annotator agreement and drift over time, will be presented as well.

David Talby

David Talby is the Chief Executive Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science.

David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.

David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.

Veysel Kocaman

Veysel is the Chief Technology Officer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP.

He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe.

He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.

Recorded On:

Wednesday, February 26th @ 2pm ET

Watch Now

An LLM-enabled Medical Terminology Server

Medical terminology servers help different systems speak the same language by providing a versioned, comprehensive, and always-current suite of medical codes. They also help organizations translate across specialized data models by enabling domain experts to manage custom code systems, value sets, and concept maps.

This session presents a fast and flexible terminology server which comes pre-loaded with all widely used medical terminologies, deploys privately behind your firewall, and provides a full API and user interface for advanced concept search, mapping, and normalization. Its standout capability is LLM-powered search which enables:

Identifying concepts when no exact match is found, useful for anything from correcting spelling mistakes to applying synonyms and hierarchies
Finding the most relevant concept given a given clinical context, great for finding specific codes for diagnoses, drugs, treatments, or adverse events
Identifying the semantically closest concept to a search term, great for multi-word terms that can be written in different ways like ICD-10 descriptions or prescriptions

Kate Weber

Kate Weber is a Senior Data Scientist at John Snow Labs who specializes in healthcare natural language processing and data standards. While completing her Ph.D. at the University of Michigan, she built algorithms to detect and classify evidence of substance use disorder in clinical notes, and pioneered approaches to using artifacts in the data annotation process to get the most out of precious labelled resources.

Her background in technical infrastructure and data engineering helps her understand the scope of the challenge facing enterprise health informatics teams. On her own time, she races bicycles and maintains the technical infrastructure for her family’s home-brewing and beekeeping adventures.

Recorded on:

Wednesday, January 22nd @ 2pm ET

Watch Now

Matching Patients with Clinical Guidelines

Healthcare systems, payers, and medical societies invest massive effort to maintain evidence-based clinical guidelines for a variety of conditions. However, when patients are in the hospital, often clinicians just don’t have the time to research or read these guidelines, leading to major gaps in how consistently they are applied. Recent advances in Medical AI can shortcut this problem by automatically reading the full history of a given patient, finding the most recent and relevant guideline for their clinical history, and presenting it in context.

This session will walk through a solution architecture for an end-to-end solution that does this, using a state-of-the-art healthcare-specific LLM, that can be deployed locally within an organization’s security perimeter to ensure privacy, compliance, and the ability to read organization-specific guideline documents. We’ll also show how to handle formatting of clinical guideline documents that are challenging to general-purpose LLMs like flowcharts, decision trees, and visual decision tables.

Veysel Kocaman

Veysel is a Head of Data Science at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP.

He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.

Recorded On:

Wednesday, December 11th @ 2pm ET

Watch Now

Integrating Multi-Modal Medical Data Into Unified Patient Journeys

Novel applications of Generative AI enable medical data analysis to go beyond single data points to make deductions about longitudinal, multi-modal patient histories. For example, while NLP can be used to extract tumor characteristics from a pathology report, computer vision can be used to analyze a medical image, or time-series analysis can find anomalies in vital signs or claims data – we are now able to build a unified picture from a patient’s full diverse history, taking all data into account, and using common-sense reasoning to deal with data conflicts and gaps.

This enables applications such as automated creation of patient cohorts, question answering about patient histories, or patient matching applications (to clinical guidelines, to clinical trials, or to research protocols). This webinar covers a reference architecture for getting this done, using healthcare-specific large language models to deliver state-of-the-art accuracy on reproducible benchmarks. We’ll also cover how to address common challenges such as explaining results, helping clinicians refine their questions, handling uncertainty, and providing an enterprise-grade, compliant platform that is being deployed to analyze millions of patients and billions of documents.

David Talby

David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science.

Recorded On:

Wednesday, November 13th @ 2pm ET

Watch Now

De-identification of Medical Images in DICOM Format

De-identification of medical records is crucial for unlocking valuable information for several reasons: Privacy, compliance, enabling medical research, and reducing the risk of data breaches. DICOM is a widely-used file format standard for exchanging medical images such as radiography, ultrasonography, computed tomography (CT), magnetic resonance imaging (MRI), and radiation therapy. Accurate anonymization of DICOM files presents unique challenges:

Sensitive information is often “burned” into the image, which requires computer vision or OCR to identify
Sensitive information is also stored in metadata fields, some of which include unstructured text
The DICOM standard is decades old, hence there are thousands of variants of file formats and metadata fields
Each DICOM file can contain thousands of images (slices), in different resolutions
Different image modalities (MRI vs. US vs. CT scans) have their own nuances

This session presents a scalable, enterprise-grade solution that provides high accuracy across supporting multiple image formats and clinical modalities. Join to see live demos & code that tackles these challenges with the help of John Snow Labs’ Visual NLP. We’ll will explore DICOM processing capabilities, from computing basic metrics on a potentially large dataset to de-identifying images and metadata. We will also discuss infrastructure and how to scale pipelines to handle heavy workloads.

Alberto Andreotti

Alberto Andreotti is a data scientist at John Snow Labs, specializing in Machine Learning, Natural Language Processing, and Distributed Computing. With a background in Computer Engineering, he has expertise in developing software for both Embedded Systems and Distributed Applications.

Alberto is skilled in Java and C++ programming, particularly for mobile platforms. His focus includes Machine Learning, High-Performance Computing (HPC), and Distributed Systems, making him a pivotal member of the John Snow Labs team.

Recorded on:

Wednesday, August 28, 2024 @ 2pm ET

Watch Now

Turnkey Deployment of Medical Language Models as Private API Endpoints

Join us for an insightful webinar showcasing how John Snow Labs’ signature models can now be effortlessly deployed as private API endpoints and seamlessly integrated into your healthcare text-processing workflows.

We will walk you through comprehensive, end-to-end examples demonstrating how to discover, deploy, and utilize John Snow Labs’ state-of-the-art language models, fine-tuned for the healthcare domain, across three leading marketplaces: AWS SageMaker, Snowflake Marketplace, and Databricks Marketplace.

Key highlights include:

Effortless Integration: Learn how easy it is to incorporate John Snow Labs language models into your existing workflows, enhancing efficiency and accuracy in medical text processing.
Flexible Endpoints: Discover the convenience of turning API endpoints on and off based on your processing needs, optimizing cost-effectiveness.
Scalable Infrastructure: Explore multiple infrastructure options designed to meet varying target scales and processing requirements.
Performance Benchmarks: Utilize accuracy and throughput benchmarks to make informed decisions about the most suitable infrastructure for your needs.

Don’t miss this opportunity to streamline your deployment process, reduce integration hassle, and elevate the performance of your healthcare applications.

Kshitiz Shakya

Kshitz has over 10 years of experience working in software engineering. Currently, he works as a Software Engineer at John Snow Labs, building solutions and products that are helping businesses in their needs. He is deeply passionate about creating scalable and trusted products that are robust and make the lives of people easier.

Recorded on:

Wednesday, June 31, 2024 @ 2pm ET

Watch now

Automated Testing of Bias, Fairness, and Robustness of Language Models in the Generative AI Lab

Testing and mitigating bias, fairness, and robustness issues in AI applications in now a legal requirement in the USA in regulated industries like healthcare, human resources, and financial services. This webinar presents new capabilities within the no-code Generative AI Lab, designed for building custom language models by non-technical domain experts, that enable compliance with such requirements and embody best practices for Responsible AI.

We’ll cover how you can:

Create, edit, and reuse test suites
Automatically generate test cases for robustness & bias
Manually review and edit tests when needed
Run LLM test suites and see both summarized and drill-down results
Run regression testing before certifying new versions of models, or competing models

This webinar is intended for anyone interested in testing, certifying, and mitigating bias issues in custom language models for real-world systems.

David Cecchini

Ph.D. at Tsinghua-Berkeley Shenzhen Institute | Data Scientist

Recorded on:

Wednesday, June 26, 2024 @ 2pm ET

Watch now

Fast, Cheap, Scalable: Open-Source LLM Inference with Spark NLP

Learn how the open-source Spark NLP library provides optimized and scalable LLM inference for high-volume text and image processing pipelines. This session dives into optimized LLM inference without the overhead of commercial APIs or extensive hardware setups. We will show live code examples and benchmarks comparing Spark NLP’s performance and cost-effectiveness against both commercial APIs and other open-source solutions.
Key Takeaways:

Learn how to efficiently process millions of LLM interactions daily, circumventing the costs associated with traditional LLM deployments.
Discover advanced methods for embedding LLM inference within existing data processing pipelines, enhancing throughput and reducing latency.
Review benchmarks that compare Spark NLP’s speed and cost metrics relative to commercial and open-source alternatives.

Danilo Burbano

Danilo Burbano is a Software and Machine Learning Engineer at John Snow Labs. He holds an MSc in Computer Science and has 13 years of commercial experience.
He has previously developed several software solutions over distributed system environments like microservices and big data pipelines across different industries and countries. Danilo has contributed to Spark NLP for the last 6 years. He is now working to maintain and evolve the Spark NLP library by continuously adding state-of-the-art NLP tools to allow the community to implement and deploy cutting-edge large-scale projects for AI and NLP.

Recorded on:

Wednesday, May 29, 2024 @ 2pm ET

Watch now

New State-of-the-art Accuracy for the 3 Primary Uses of Healthcare Language Models

This talk presents new levels of accuracy that have very recently been achieved, on public and independently reproducible benchmarks, on the three most common use cases for language models in healthcare:

Understanding clinical documents: Such as information extraction from clinical notes and reports; detecting entities, relationships, and medical codes; de-identification; and summarization.
Reasoning about patients: Fusing information across multiple modalities (tabular data, free text, imaging, omics) to create a longitudinal view of each patient, including making reasonable inferences and explaining them.
Answering medical questions: Answering medical licensing exam questions, biomedical research questions, and similar medical knowledge questions – accurately, without hallucinations, and while citing relevant sources.

Join to learn what has recently become possible in the fast-changing world of Healthcare AI.

David Talby

David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.

Recorded on:

Tuesday, April 30, 2024 @ 2pm ET

Webinars

Consistent Linking, Tokenization, and Obfuscation for Regulatory-Grade De-Identification of Longitudinal Medical Data

Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

Comparing Frontier LLMs on Analyzing Clinical Narratives

An LLM-enabled Medical Terminology Server

Matching Patients with Clinical Guidelines

Integrating Multi-Modal Medical Data Into Unified Patient Journeys

De-identification of Medical Images in DICOM Format

Turnkey Deployment of Medical Language Models as Private API Endpoints

Automated Testing of Bias, Fairness, and Robustness of Language Models in the Generative AI Lab

Fast, Cheap, Scalable: Open-Source LLM Inference with Spark NLP

New State-of-the-art Accuracy for the 3 Primary Uses of Healthcare Language Models

The 2024 Generative AI in Healthcare Survey

John Snow Labs’ Native Integrations with LangChain and HayStack

Next-Gen Table Extraction from Visual Documents: Leveraging Multimodal AI

Building a RAG LLM Clinical Chatbot with John Snow Labs in Databricks

Introducing the Medical Research Chatbot

Extracting Social Determinants of Health from Free-Text Medical Records

From GPT-4 to Llama-2: Supercharging State-of-the-Art Embeddings for Vector Databases with Spark NLP

Contract Understanding with Legal NLP: building a Paralegal Service with AI

Deliver Safe, Fair & Robust Language Models with the NLPTest Library

Automated Summarization of Clinical Notes

Zero-Shot Visual Question Answering

State-Of-The-Art Medical Data De-identification and Obfuscation

Combining Prompt Engineering, Programmatic Labelling, and Model Tuning in the No-Code NLP Lab

NLP for Oncology: Extracting Staging, Histology, Tumor, Biomarker, and Treatment Facts from Clinical Notes

Zero-Shot Learning of Healthcare NLP Models

Automated Text Generation & Data-Augmentation for Medicine, Finance, Law, and E-Commerce

Text classification and named entity recognition with BertForTokenClassification & BertForSequenceClassification

Zero Shot Learning for Semantic Relation Extraction from Unstructured Text

Building Real-World Healthcare AI Projects from Concept to Production

Deeper Clinical Document Understanding Using Relation Extraction

Rule-Based and Pattern Matching for Entity Recognition in Spark NLP

Automating Clinical Trial Master File Migration & Information Extraction

Enterprise-Scale Data Labeling & Automated Model Training with the Free Annotation Lab

Creating a Clinical Knowledge Graph with Spark NLP and neo4j

1 Line of Code to Use 200+ State-of-the-Art Clinical & Biomedical NLP Models

Accurate Table Extraction from Documents & Images with Spark OCR

Speed Optimization & Benchmarks in Spark NLP 3: Making the Most of Modern Hardware

Visual Document Understanding with Multi-Modal Image & Text Mining in Spark OCR 3

Using & Expanding the NLP Models Hub

State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code

Automated Drug Adverse Event Detection from Unstructured Text

John Snow Labs NLU: Become a Data Science Superhero with One Line of Python code

Answering natural language questions

Accurate de-identification, obfuscation, and editing of scanned medical documents and images

Hardening a Cleanroom AI Platform to allow model training & inference on Protected Health Information

Maximizing Text Recognition Accuracy with Image Transformers in Spark OCR

Best Practices & Tools for Accurate Document Annotation and Data Abstraction

Automated Mapping of Clinical Entities from Natural Language Text to Medical Terminologies

AI Model Governance in a High-Compliance Industry

Accurate De-Identification of Structured & Unstructured Medical Data at Scale

State-of-the-art named entity recognition with BERT