Deploying John Snow Labs Medical LLMs on Databricks: Three Flexible Deployment Options

05.11.2025

Veysel Kocaman

As healthcare organizations increasingly adopt Large Language Models (LLMs) and Vision Language Models (VLMs) for clinical applications, the question of “how to deploy these models securely and efficiently” becomes critical. At John Snow Labs, we’ve worked closely with Databricks to provide three flexible deployment options that balance performance, control, and ease of use — all while maintaining the security and compliance standards that healthcare demands.

In this post, I’ll walk you through each deployment option based on a recent customer demonstration, showing you exactly how to leverage our state-of-the-art medical LLMs on Databricks infrastructure.

Why Medical-Specific LLMs Dramatically Outperform General Models

Before diving into deployment options, let’s establish why specialized medical LLMs aren’t just incrementally better — they’re transformationally superior for healthcare applications.

OpenMed Benchmark Results

Our JSL-VL-30B (Vision Language Model) achieves 83.5% average accuracy across critical medical domains

Multi-Modal Medical Benchmark: Vision + Reasoning

For vision-based medical understanding tasks

Our models show 15–50% better performance on tasks requiring visual medical reasoning — critical for processing medical images, charts, and handwritten notes.

MedHELM: Healthcare-Specific Evaluation

On Stanford’s MedHELM benchmark (healthcare-specific evaluation):

John Snow Labs Medical LLMs Offerings at Marketplaces

There is overwhelming evidence from both academic research and industry benchmarks that domain-specific, task-optimized large language models consistently outperform general-purpose LLMs in healthcare. At John Snow Labs, we’ve developed a suite of Medical LLMs purpose-built for clinical, biomedical, and life sciences applications.

Our models are designed to deliver best-in-class performance across a wide range of medical tasks — from clinical reasoning and diagnostics to medical research comprehension and genetic analysis.

Three Deployment Options on Databricks

Option 1: Healthcare NLP Library (Direct Integration)

The Healthcare NLP Library is a powerful component of John Snow Labs, designed to facilitate NLP tasks within the healthcare domain. This library provides over 2,500 pre-trained models and pipelines tailored for medical data, enabling accurate information extraction, NER for clinical and medical concepts, and text analysis capabilities. Regularly updated and built with cutting-edge algorithms, the Healthcare library aims to streamline information processing and empower healthcare professionals with deeper insights from unstructured medical data sources, such as electronic health records, clinical notes, and biomedical literature.

John Snow Labs has created custom large language models (LLMs) tailored for diverse healthcare use cases. These models come in different sizes and quantization levels, designed to handle tasks such as summarizing medical notes, answering questions, performing retrieval-augmented generation (RAG), named entity recognition and facilitating healthcare-related chats.

Best for: Development, testing, and production workloads where you need maximum flexibility and integration with existing Spark NLP pipelines. See the documentation here:

This is our native approach where medical LLMs run directly within your Databricks notebooks using the Healthcare NLP library. You get access to our full suite of models across various sizes.

from johnsnowlabs import nlp, medical

# Initialize Spark session with your license
spark = nlp.start()

# Load document assembler
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

# Load medical LLM
# Recommended: 8B q8 model for A10 GPU (24GB VRAM)
medical_llm = medical.AutoGGUFModel.pretrained("jsl_meds_8b_q8_v4", "en", "clinical/models")\
    .setInputCols("document")\
    .setOutputCol("completions")\
    .setBatchSize(1)\
    .setNPredict(500)\
    .setUseChatTemplate(True)\
    .setTemperature(0)

# Create pipeline
pipeline = nlp.Pipeline(stages=[
    document_assembler,
    medical_llm
])

# Process clinical data
results = pipeline.fit(data).transform(data)

Option 2: Databricks Container Services

Best for: Teams needing containerized deployments with GPU acceleration and reproducible environments.

This option uses a pre-built Docker container that you deploy on your Databricks cluster. The container includes all necessary dependencies, optimized libraries, and model artifacts.

# Models are pre-loaded in the container
"""
available_models

|model_name             | model_type | availibility|
|-----------------------|------------|-----------|
| jsl_meds_4b           | text only  | available |
| jsl_meds_8b           | text only  | available |
| jsl_medm_14b          | text only  | available |
| jsl_meds_vl_3b        | vision     | available |
| jsl_meds_vl_7b        | vision     | available |
| jsl_meds_vl_4b        | vision     | will be available |
| jsl_meds_vl_8b        | vision     | will be available |
| jsl_medm_vl_30b       | vision     | available |
| jsl_meds_reasoning_8b | reasoning  | available |
| jsl_medm_reasoning_32b| reasoning  | available |
"""

from jsl_llm_lib.jsl_llm import JslLlm
llm = JslLlm()
llm.load_model(llm_name="jsl_meds_4b")

text = """Climate change refers to long-term shifts in temperatures and weather patterns. These shifts may be natural, but since the 1800s, human activities have been the main driver, primarily due to burning fossil fuels like coal, oil, and gas. Burning these materials releases greenhouse gases, which trap the sun’s heat and raise global temperatures. Consequences include rising sea levels, more extreme weather events, and disruptions to food and water supply. Addressing climate change requires global cooperation to reduce emissions, transition to renewable energy sources, and implement sustainable practices across various sectors."""

prompt = f"""You are a **summarization system**.
### Instruction:
You will be given a long piece of text. Summarize it into a **short version that is no more than 30–40% of the original length**.

### Text:
{text}

### Response:
"""

result = llm.get_prediction(prompt=prompt)
print("Result:",f"{result}")

'''
Result: Climate change, driven mainly by human activities since the 1800s—especially fossil fuel burning—has led to increased greenhouse gas emissions, rising global temperatures, and severe consequences like sea level rise, extreme weather, and disrupted resources. Mitigation requires global efforts to cut emissions, shift to renewable energy, and adopt sustainable practices.
'''
}

MultiModal (vision) Models

from jsl_llm_lib.jsl_llm import JslLlm
import os

llm = JslLlm()
llm.load_model(llm_name="jsl_meds_vl_3b")

images_dir = "images"
os.makedirs(images_dir, exist_ok=True)
os.system(f"wget -O {images_dir}/prescription.png  -q \"https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/healthcare-nlp/data/ocr/prescription_02.png\"")

image_path = os.path.join(images_dir, "prescription.png")

prompt = """Extract demographic, clinical disease and medication informations"""

result = llm.get_prediction(prompt=prompt, image_path=image_path)

print("Result:",f"{result}")

"""
Result: **Demographic Information:**
- **Name:** Ms. Rukhsana Shaheen
- **Age:** 56 years
- **Sex:** Female

**Clinical Disease Information:**
- **Systemic Lupus Erythematosus (SLE) and Scleroderma Overlap with Interstitial Lung Disease (ILD)**
- **Present problem:** Tightness of skin of the fists and ulcers on the pulp of the fingers

**Medication Information:**
- **Linezolid 600 mg twice a day for 5 days** (as advised in detail in case the ulcers of the fingers do not heal)
- **Clopidogrel 75 mg once a day after meals**
- **Amlodipine 5 mg once a day**
- **Domperidone 10 mg twice a day before meals**
- **Omeprazole 20 mg twice a day before meal**
- **Bosentan 62.5 mg twice a day after meals**
- **Sildenafil Citrate 0.5 mg twice a day after meals**
- **Prednisolone 5 mg once a day after breakfast**
- **Mycophenolate mofetil 500 mg 2 tablets twice a day**
- **L-methylfolate calcium 400 μg 1 tablet once a day**
- **Ciprofloxacin 250 mg twice a day**
- **Review after 4 weeks**
"""

Option 3: Databricks Private Endpoints

Best for: Production deployments requiring auto-scaling, minimal infrastructure management, and API-based access.

Private endpoints provide a fully managed service where Databricks handles infrastructure while keeping your data and models within your private network. This is our recommended approach for production at scale.

John Snow Labs Announces Turnkey Deployment of Medical Language Models as Private API Endpoints, Boosting Efficiency, Security, and Compliance Tool to Test and Evaluate Custom Language Models

Medical LLMs| John Snow Labs

https://learn.microsoft.com/en-us/azure/databricks/partners/ml/john-snow-labs

import requests
import json

# Your private endpoint URL (provided after subscription)
endpoint_url = "https://.databricks.com/serving-endpoints/jsl-medical-llm-32b"

headers = {
    "Authorization": f"Bearer {databricks_token}",
    "Content-Type": "application/json"
}

# Clinical summarization request
payload = {
    "inputs": {
        "prompt": """
        Summarize the following clinical note into a concise assessment:
        
        [Clinical note text here...]
        """,
        "max_tokens": 500,
        "temperature": 0.0
    }
}

# Make synchronous call
response = requests.post(endpoint_url, headers=headers, json=payload)
result = response.json()

print(result["completions"][0]["text"])

{
    "min_instances": 0,  # Scale to zero when idle
    "max_instances": 5,
    "scale_down_delay": "5m"
}
# Pros: Minimal cost during idle periods
# Cons: ~30-60s cold start on first request

{
    "min_instances": 2,  # Always warm
    "max_instances": 20,
    "target_utilization": "60%",
    "scale_down_delay": "15m"
}
# Pros: No cold starts, smooth scaling
# Cons: Higher minimum cost

Choose Healthcare NLP Library if:

You need to combine LLMs with specialized NER/RE models
Building complex multi-stage pipelines (e.g., NER → RAG → Summarization)
Want maximum flexibility in prompt engineering and model chaining
Already using Spark NLP for other healthcare tasks

Choose Container Services if:

Standardizing deployments across multiple teams
Need reproducible environments for compliance/audit
Prefer Docker-based MLOps workflows
Want to version-lock your entire stack

Choose Private Endpoints if:

Processing variable workloads (e.g., batch jobs during specific hours)
Building microservices or REST APIs consuming LLMs
Want to minimize DevOps overhead
Need auto-scaling for unpredictable traffic patterns

Key Takeaways

John Snow Labs provides the most accurate medical LLMs on the market, outperforming GPT-4o, Claude, and Med-PaLM-2 on healthcare benchmarks by 4–11 percentage points
Three flexible deployment options on Databricks let you balance control, scalability, and ease of use based on your specific requirements
Vision Language Models (VLMs) can process complex medical documents, including handwritten notes and scanned images, with accuracy preferred by medical practitioners 46–175% more often than GPT-4o
All deployments are HIPAA/GDPR compliant with no data leaving your secure environment — a critical advantage over cloud-based APIs
Model sizes range from 2B to 70B parameters, allowing you to choose the right balance of accuracy and speed for your GPU resources and budget
Specialized medical models (NER, RAG, SOAP generation) provide domain-specific capabilities not available in general-purpose LLMs
Production-ready performance with 80–300 tokens/sec on modern GPUs and auto-scaling capabilities for variable workloads

Complete Notebooks:

📖 Documentation:

Try Large Language models in Healthcare

See in action

Veysel Kocaman

Our additional expert:

Veysel is the Chief Technology Officer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.

John Snow Labs to Lead Medical Language Work in Horizon Europe’s €27M UNIFIED Project

Gina Devine

John Snow Labs is proud to announce its participation as an industry partner in UNIFIED (Unifying Framework for Patient-Centred Clinical-study Endpoints Derived...