was successfully added to your cart.

    Enhancing Oncology Patient Journeys with Document Understanding and Medical LLMs

    Avatar photo
    Data Scientist at John Snow Labs

    Why are patient journeys critical for oncology care?

    Oncology care is inherently longitudinal, complex, and personalized. Each patient’s history includes diagnostic imaging, pathology, lab results, clinical notes, treatment regimens, and in some cases adverse events. To deliver evidence-based, patient-specific care, clinicians need systems that present the right information at the right time.

    John Snow Labs and Roche Diagnostics Information Solutions have both demonstrated that healthcare-specific Large Language Models (LLMs) can enable scalable, automated generation of oncology patient journeys. These tools reduce manual data abstraction, extract high-fidelity timelines from unstructured records, and align clinical decisions with up-to-date guidelines like the ones published by the National Comprehensive Cancer Network (NCCN).

    What role does document understanding play in oncology timelines?

    Healthcare data is largely unstructured and heterogeneous. Optical Character Recognition systems (OCR), Natural Language Processing techniques, and entity recognition models form the backbone of document understanding, enabling:

    • Extraction of medications, cancer stages, biomarkers, procedures
    • Temporal relations like diagnosis, relapse, or recurrence events
    • Identification of chemotherapy cycles, surgery dates, and treatment response

    Roche’s Navify Oncology Hub exemplifies this with NLP-powered abstraction of breast, ovarian, and skin cancer timelines across EHRs. John Snow Labs models support this with over 400 pretrained medical entity recognition and assertion detection models.

    Explore our Medical NLP Models.

    How is reasoning applied to infer and refine oncology data?

    After extracting events and entities, reasoning models:

    • Merge redundant entries (e.g., “left breast tumor” vs. “tumor”)
    • Infer clinical state from sparse data (e.g., BMI from weight and height)
    • Interpret temporal relations (e.g., “before”, “overlaps”, “starts on”)

    Roche used prompt-engineered LLMs to classify relations between drug mentions and time expressions in EHRs. John Snow Labs’ Medical Reasoning LLMs, trained on 62,000 clinical reasoning examples, apply this methodology with high accuracy, enabling:

    • Timeline creation
    • Risk score calculation
    • Treatment decision support

    Learn more about our Reasoning LLMs.

    How do conversational LLMs support cohort discovery and care recommendations?

    After patient journeys are extracted from EHRs and structured into an OMOP common data model, clinicians or analysts can query the database using natural language:

    “List all patients treated with taxol and carboplatin in 2023 who had liver metastases.”

    This agentic system:

    • Maps medical terms to standard codes (via Terminology Server)
    • Formulate and executes optimized, validated SQL queries
    • Returns consistent results with explanations

    After this, Roche’s Smart Navigation capability within Navify aligns query results with the most relevant NCCN guideline section, supporting rapid, evidence-based treatment planning.

    Explore our Terminology Server.

    What insights emerged from Roche’s share task on chemotherapy timeline extraction?

    In 2024, Roche and other partners conducted a benchmark task to extract chemotherapy timelines from real-world EHRs. Two tasks were defined:

    • Subtask 1: Create timelines from gold-labeled drugs and time mentions
    • Subtask 2: End-to-end extraction from raw notes

    Findings included:

    • Structured prompts significantly improved zero-shot LLM performance
    • Healthcare-specific models like JSL Med LLaMA-3 8B performed better than general LLMs
    • Event labeling, temporal normalization, and relation classification required modular workflows

    How does this technology scale across enterprises?

    • John Snow Labs offers models that run on-premises or in private cloud
    • Roche deploys within Navify, securely integrated across care teams
    • Both support OHDSI OMOP-CDM standard, ensuring interoperability

    With floating, server-based licenses and continuous updates, John Snow Labs helps organizations:

    • Ingest multi-modal data (notes, labs, FHIR)
    • Perform real-time question answering
    • Track patient outcomes across populations

    FAQs

    What types of cancer are supported for patient timelines? The system has been applied to breast, ovarian, and skin cancers, but supports any domain with structured and unstructured EHR data.

    How are NCCN guidelines integrated? Patient data is mapped to structured models, which index guideline content. Smart navigation highlights the most relevant sections.

    Is this compliant with privacy regulations? Yes. All data processing occurs within the customer’s infrastructure, and no information, including Protected Health Information (PHI) or intellectual proprietary information leaves the environment.

    How was the model accuracy evaluated? Via blind reviews by medical professionals and public benchmark competitions (e.g., OpenMed, Chemotherapy Timeline Challenge).

     

    Supplementary Q&A

    What are the limitations of using LLMs in oncology workflows? General purpose LLMs can hallucinate, struggle with rare diseases, and incur high compute costs. Modular pipelines with smaller but more accurate domain-specific models, like those from John Snow Labs, mitigate these risks.

    How can hospitals and biopharma companies collaborate using this tech? Using common standards (OMOP-CDM, SNOMED CT) and de-identified patient graphs, institutions can collaborate on research, trials, and outcome optimization.

    What does a successful patient journey model enable?

    It allows clinicians to:

    • Visualize care progression across years
    • Make guideline-aligned treatment decisions
    • Monitor outcomes and care gaps at scale

    How useful was this post?

    Watch a full video

    Watch now
    Avatar photo
    Data Scientist at John Snow Labs
    Our additional expert:
    Julio Bonis is a data scientist working on Healthcare NLP at John Snow Labs. Julio has broad experience in software development and design of complex data products within the scope of Real World Evidence (RWE) and Natural Language Processing (NLP). He also has substantial clinical and management experience – including entrepreneurship and Medical Affairs. Julio is a medical doctor specialized in Family Medicine (registered GP), has an Executive MBA – IESE, an MSc in Bioinformatics, and an MSc in Epidemiology.

    Reliable and verified information compiled by our editorial and professional team. John Snow Labs' Editorial Policy.

    Automating AI Governance for Healthcare Applications of Generative AI

    How can AI governance be automated in healthcare applications of generative AI? Automating AI governance in healthcare involves embedding bias detection, robustness...
    preloader