Enhancing Oncology Patient Journeys with Document Understanding and Medical LLMs

31.07.2025

Julio Bonis

Data Scientist at John Snow Labs

Why are patient journeys critical for oncology care?

Oncology care is inherently longitudinal, complex, and personalized. Each patient’s history includes diagnostic imaging, pathology, lab results, clinical notes, treatment regimens, and in some cases adverse events. To deliver evidence-based, patient-specific care, clinicians need systems that present the right information at the right time.

John Snow Labs and Roche Diagnostics Information Solutions have both demonstrated that healthcare-specific Large Language Models (LLMs) can enable scalable, automated generation of oncology patient journeys. These tools reduce manual data abstraction, extract high-fidelity timelines from unstructured records, and align clinical decisions with up-to-date guidelines like the ones published by the National Comprehensive Cancer Network (NCCN).

What role does document understanding play in oncology timelines?

Healthcare data is largely unstructured and heterogeneous. Optical Character Recognition systems (OCR), Natural Language Processing techniques, and entity recognition models form the backbone of document understanding, enabling:

- Extraction of medications, cancer stages, biomarkers, procedures
- Temporal relations like diagnosis, relapse, or recurrence events
- Identification of chemotherapy cycles, surgery dates, and treatment response

Roche’s Navify Oncology Hub exemplifies this with NLP-powered abstraction of breast, ovarian, and skin cancer timelines across EHRs. John Snow Labs models support this with over 400 pretrained medical entity recognition and assertion detection models.

Explore our Medical NLP Models.

How is reasoning applied to infer and refine oncology data?

After extracting events and entities, reasoning models:

Merge redundant entries (e.g., “left breast tumor” vs. “tumor”)
Infer clinical state from sparse data (e.g., BMI from weight and height)
Interpret temporal relations (e.g., “before”, “overlaps”, “starts on”)

Roche used prompt-engineered LLMs to classify relations between drug mentions and time expressions in EHRs. John Snow Labs’ Medical Reasoning LLMs, trained on 62,000 clinical reasoning examples, apply this methodology with high accuracy, enabling:

Timeline creation
Risk score calculation
Treatment decision support

Learn more about our Reasoning LLMs.

How do conversational LLMs support cohort discovery and care recommendations?

After patient journeys are extracted from EHRs and structured into an OMOP common data model, clinicians or analysts can query the database using natural language:

“List all patients treated with taxol and carboplatin in 2023 who had liver metastases.”

This agentic system:

Maps medical terms to standard codes (via Terminology Server)
Formulate and executes optimized, validated SQL queries
Returns consistent results with explanations

After this, Roche’s Smart Navigation capability within Navify aligns query results with the most relevant NCCN guideline section, supporting rapid, evidence-based treatment planning.

Explore our Terminology Server.

What insights emerged from Roche’s share task on chemotherapy timeline extraction?

In 2024, Roche and other partners conducted a benchmark task to extract chemotherapy timelines from real-world EHRs. Two tasks were defined:

Subtask 1: Create timelines from gold-labeled drugs and time mentions
Subtask 2: End-to-end extraction from raw notes

Findings included:

Structured prompts significantly improved zero-shot LLM performance
Healthcare-specific models like JSL Med LLaMA-3 8B performed better than general LLMs
Event labeling, temporal normalization, and relation classification required modular workflows

How does this technology scale across enterprises?

John Snow Labs offers models that run on-premises or in private cloud
Roche deploys within Navify, securely integrated across care teams
Both support OHDSI OMOP-CDM standard, ensuring interoperability

With floating, server-based licenses and continuous updates, John Snow Labs helps organizations:

Ingest multi-modal data (notes, labs, FHIR)
Perform real-time question answering
Track patient outcomes across populations

FAQs

What types of cancer are supported for patient timelines? The system has been applied to breast, ovarian, and skin cancers, but supports any domain with structured and unstructured EHR data.

How are NCCN guidelines integrated? Patient data is mapped to structured models, which index guideline content. Smart navigation highlights the most relevant sections.

Is this compliant with privacy regulations? Yes. All data processing occurs within the customer’s infrastructure, and no information, including Protected Health Information (PHI) or intellectual proprietary information leaves the environment.

How was the model accuracy evaluated? Via blind reviews by medical professionals and public benchmark competitions (e.g., OpenMed, Chemotherapy Timeline Challenge).

Supplementary Q&A

What are the limitations of using LLMs in oncology workflows? General purpose LLMs can hallucinate, struggle with rare diseases, and incur high compute costs. Modular pipelines with smaller but more accurate domain-specific models, like those from John Snow Labs, mitigate these risks.

How can hospitals and biopharma companies collaborate using this tech? Using common standards (OMOP-CDM, SNOMED CT) and de-identified patient graphs, institutions can collaborate on research, trials, and outcome optimization.

What does a successful patient journey model enable?

It allows clinicians to:

Visualize care progression across years
Make guideline-aligned treatment decisions
Monitor outcomes and care gaps at scale

Watch a full video

Watch now

Julio Bonis

Data Scientist at John Snow Labs

Our additional expert:

Julio Bonis is a data scientist working on Healthcare NLP at John Snow Labs. Julio has broad experience in software development and design of complex data products within the scope of Real World Evidence (RWE) and Natural Language Processing (NLP). He also has substantial clinical and management experience – including entrepreneurship and Medical Affairs. Julio is a medical doctor specialized in Family Medicine (registered GP), has an Executive MBA – IESE, an MSc in Bioinformatics, and an MSc in Epidemiology.

Automating AI Governance for Healthcare Applications of Generative AI

Julio Bonis

How can AI governance be automated in healthcare applications of generative AI? Automating AI governance in healthcare involves embedding bias detection, robustness...

Enhancing Oncology Patient Journeys with Document Understanding and Medical LLMs

Why are patient journeys critical for oncology care?

What role does document understanding play in oncology timelines?

How is reasoning applied to infer and refine oncology data?

How do conversational LLMs support cohort discovery and care recommendations?

What insights emerged from Roche’s share task on chemotherapy timeline extraction?

How does this technology scale across enterprises?

FAQs

Supplementary Q&A

Automating AI Governance for Healthcare Applications of Generative AI

Recommended For You