was successfully added to your cart.

    Deploying John Snow Labs Medical LLMs on Databricks: Three Flexible Deployment Options

    As healthcare organizations increasingly adopt Large Language Models (LLMs) and Vision Language Models (VLMs) for clinical applications, the question of “how to deploy these models securely and efficiently” becomes critical. At John Snow Labs, we’ve worked closely with Databricks to provide three flexible deployment options that balance performance, control, and ease of use — all while maintaining the security and compliance standards that healthcare demands.

    In this post, I’ll walk you through each deployment option based on a recent customer demonstration, showing you exactly how to leverage our state-of-the-art medical LLMs on Databricks infrastructure.

    Why Medical-Specific LLMs Dramatically Outperform General Models

    Before diving into deployment options, let’s establish why specialized medical LLMs aren’t just incrementally better — they’re transformationally superior for healthcare applications.

    OpenMed Benchmark Results

    Our JSL-VL-30B (Vision Language Model) achieves 83.5% average accuracy across critical medical domains

    Multi-Modal Medical Benchmark: Vision + Reasoning

    For vision-based medical understanding tasks

    Our models show 15–50% better performance on tasks requiring visual medical reasoning — critical for processing medical images, charts, and handwritten notes.

    MedHELM: Healthcare-Specific Evaluation

    On Stanford’s MedHELM benchmark (healthcare-specific evaluation):

    John Snow Labs Medical LLMs Offerings at Marketplaces

    There is overwhelming evidence from both academic research and industry benchmarks that domain-specific, task-optimized large language models consistently outperform general-purpose LLMs in healthcare. At John Snow Labs, we’ve developed a suite of Medical LLMs purpose-built for clinical, biomedical, and life sciences applications.

    Our models are designed to deliver best-in-class performance across a wide range of medical tasks — from clinical reasoning and diagnostics to medical research comprehension and genetic analysis.

    Three Deployment Options on Databricks

    Option 1: Healthcare NLP Library (Direct Integration)

    The Healthcare NLP Library is a powerful component of John Snow Labs, designed to facilitate NLP tasks within the healthcare domain. This library provides over 2,500 pre-trained models and pipelines tailored for medical data, enabling accurate information extraction, NER for clinical and medical concepts, and text analysis capabilities. Regularly updated and built with cutting-edge algorithms, the Healthcare library aims to streamline information processing and empower healthcare professionals with deeper insights from unstructured medical data sources, such as electronic health records, clinical notes, and biomedical literature.

    John Snow Labs has created custom large language models (LLMs) tailored for diverse healthcare use cases. These models come in different sizes and quantization levels, designed to handle tasks such as summarizing medical notes, answering questions, performing retrieval-augmented generation (RAG), named entity recognition and facilitating healthcare-related chats.

    Best for: Development, testing, and production workloads where you need maximum flexibility and integration with existing Spark NLP pipelines. See the documentation here:

    This is our native approach where medical LLMs run directly within your Databricks notebooks using the Healthcare NLP library. You get access to our full suite of models across various sizes.

    from johnsnowlabs import nlp, medical
    
    # Initialize Spark session with your license
    spark = nlp.start()
    
    # Load document assembler
    document_assembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")
    
    # Load medical LLM
    # Recommended: 8B q8 model for A10 GPU (24GB VRAM)
    medical_llm = medical.AutoGGUFModel.pretrained("jsl_meds_8b_q8_v4", "en", "clinical/models")\
        .setInputCols("document")\
        .setOutputCol("completions")\
        .setBatchSize(1)\
        .setNPredict(500)\
        .setUseChatTemplate(True)\
        .setTemperature(0)
    
    # Create pipeline
    pipeline = nlp.Pipeline(stages=[
        document_assembler,
        medical_llm
    ])
    
    # Process clinical data
    results = pipeline.fit(data).transform(data)

    Option 2: Databricks Container Services

    Best for: Teams needing containerized deployments with GPU acceleration and reproducible environments.

    This option uses a pre-built Docker container that you deploy on your Databricks cluster. The container includes all necessary dependencies, optimized libraries, and model artifacts.

    from johnsnowlabs import nlp, medical
    
    # Models are pre-loaded in the container
    available_models = [
        "jsl_medm_q8_v3",          # 14B general medical model
        "jsl_meds_8b_q8_v4",       # 8B specialized model
        "jsl_meds_vlm_3b_q8_v1",   # 3B vision model
        "jsl_meds_ner_q8_v4"       # 3.5B entity extraction
    ]
    
    # Load a vision model for document processing
    vlm = medical.AutoGGUFModel.pretrained("jsl_meds_vlm_3b_q8_v1", "en", "clinical/models")\
        .setInputCols("document")\
        .setOutputCol("extracted_data")\
        .setUseChatTemplate(True)\
        .setTemperature(0)
    
    # Prompt for structured extraction
    extraction_prompt = """
    Extract the following information from this medical document:
    1. Patient demographics (name, age, gender)
    2. All laboratory test results with values and units
    3. Any medications mentioned with dosage and frequency
    
    Return as structured JSON.
    """
    
    # Process document image
    vlm_pipeline = nlp.Pipeline(stages=[
        document_assembler,
        vlm.setPrompt(extraction_prompt)
    ])
    
    # Single image processing: ~5-8 seconds on H100
    result = vlm_pipeline.transform(image_data)
    
    # sample output
    
    {
      "demographics": {
        "name": "Patient Name",
        "age": 45,
        "gender": "Male"
      },
      "lab_results": [
        {"test": "Hemoglobin", "value": "14.2", "unit": "g/dL"},
        {"test": "Glucose (fasting)", "value": "105", "unit": "mg/dL"}
      ],
      "medications": [
        {
          "name": "Metformin",
          "dosage": "500mg",
          "frequency": "twice daily"
        }
      ]
    }

    Option 3: Databricks Private Endpoints

    Best for: Production deployments requiring auto-scaling, minimal infrastructure management, and API-based access.

    Private endpoints provide a fully managed service where Databricks handles infrastructure while keeping your data and models within your private network. This is our recommended approach for production at scale.

    John Snow Labs Announces Turnkey Deployment of Medical Language Models as Private API Endpoints, Boosting Efficiency, Security, and Compliance Tool to Test and Evaluate Custom Language Models

    Medical LLMs| John Snow Labs

    https://learn.microsoft.com/en-us/azure/databricks/partners/ml/john-snow-labs 

    import requests
    import json
    
    # Your private endpoint URL (provided after subscription)
    endpoint_url = "https://.databricks.com/serving-endpoints/jsl-medical-llm-32b"
    
    headers = {
        "Authorization": f"Bearer {databricks_token}",
        "Content-Type": "application/json"
    }
    
    # Clinical summarization request
    payload = {
        "inputs": {
            "prompt": """
            Summarize the following clinical note into a concise assessment:
            
            [Clinical note text here...]
            """,
            "max_tokens": 500,
            "temperature": 0.0
        }
    }
    
    # Make synchronous call
    response = requests.post(endpoint_url, headers=headers, json=payload)
    result = response.json()
    
    print(result["completions"][0]["text"])
    {
        "min_instances": 0,  # Scale to zero when idle
        "max_instances": 5,
        "scale_down_delay": "5m"
    }
    # Pros: Minimal cost during idle periods
    # Cons: ~30-60s cold start on first request
    {
        "min_instances": 2,  # Always warm
        "max_instances": 20,
        "target_utilization": "60%",
        "scale_down_delay": "15m"
    }
    # Pros: No cold starts, smooth scaling
    # Cons: Higher minimum cost

    Choose Healthcare NLP Library if:

    • You need to combine LLMs with specialized NER/RE models
    • Building complex multi-stage pipelines (e.g., NER → RAG → Summarization)
    • Want maximum flexibility in prompt engineering and model chaining
    • Already using Spark NLP for other healthcare tasks

    Choose Container Services if:

    • Standardizing deployments across multiple teams
    • Need reproducible environments for compliance/audit
    • Prefer Docker-based MLOps workflows
    • Want to version-lock your entire stack

    Choose Private Endpoints if:

    • Processing variable workloads (e.g., batch jobs during specific hours)
    • Building microservices or REST APIs consuming LLMs
    • Want to minimize DevOps overhead
    • Need auto-scaling for unpredictable traffic patterns

    Key Takeaways

    • John Snow Labs provides the most accurate medical LLMs on the market, outperforming GPT-4o, Claude, and Med-PaLM-2 on healthcare benchmarks by 4–11 percentage points
    • Three flexible deployment options on Databricks let you balance control, scalability, and ease of use based on your specific requirements
    • Vision Language Models (VLMs) can process complex medical documents, including handwritten notes and scanned images, with accuracy preferred by medical practitioners 46–175% more often than GPT-4o
    • All deployments are HIPAA/GDPR compliant with no data leaving your secure environment — a critical advantage over cloud-based APIs
    • Model sizes range from 2B to 70B parameters, allowing you to choose the right balance of accuracy and speed for your GPU resources and budget
    • Specialized medical models (NER, RAG, SOAP generation) provide domain-specific capabilities not available in general-purpose LLMs
    • Production-ready performance with 80–300 tokens/sec on modern GPUs and auto-scaling capabilities for variable workloads

    Complete Notebooks:

    📖 Documentation:

    How useful was this post?

    Try Large Language models in Healthcare

    See in action
    Our additional expert:
    Veysel is the Chief Technology Officer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.

    Reliable and verified information compiled by our editorial and professional team. John Snow Labs' Editorial Policy.

    John Snow Labs to Lead Medical Language Work in Horizon Europe’s €27M UNIFIED Project

    John Snow Labs is proud to announce its participation as an industry partner in UNIFIED (Unifying Framework for Patient-Centred Clinical-study Endpoints Derived...
    preloader