AI-Driven Oncology Insights: Unlocking Data from EHRs with NLP and LLMs

16.06.2025

Julio Bonis

Data Scientist at John Snow Labs

Electronic Health Records (EHRs) hold immense potential for improving oncology care. They contain detailed histories, diagnostic findings, treatment plans, and physician notes, all of which are essential for delivering personalized cancer care. Yet much of this data remains underutilized due to its unstructured format and inconsistencies in documentation.

Natural language processing (NLP) and large language models (LLMs) offer a way forward, enabling oncology teams to extract, structure, and analyze data at scale. This blog explores how AI is transforming oncology through better EHR utilization and the tangible benefits it offers clinicians, researchers, and administrators.

The Hidden Burden: EHR Documentation Errors in Oncology

Despite best intentions, EHRs often contain errors that affect downstream decision-making. A recent real-world study examining 776 cancer patients identified a documentation error rate of 15.3%¹. These mistakes can delay treatment, skew clinical research, and hinder quality reporting.

Healthcare NLP from John Snow Labs helps identify and correct inconsistencies by comparing notes across sources and surfacing anomalies. LLMs extend this by understanding medical context, flagging unlikely statements, and even suggesting plausible corrections.

Scaling Manual Abstraction: From Dozens to Thousands of Notes

Manual data abstraction is a major bottleneck in clinical trials and outcomes research. For instance, a study of 622 lung cancer patients involved reviewing over 55,000 clinical notes to extract RECIST measurements². Scaling such efforts without automation is infeasible.

With tools like the Generative AI Lab, domain-specific LLMs dramatically accelerate this process. These models can identify key variables (e.g., tumor progression, drug regimens) and output structured data, enabling faster cohort identification and deeper insight generation.

Cancer-Specific NLP Models: From General AI to Precision Oncology

Generic LLMs often lack the granularity needed for oncology. John Snow Labs’ Healthcare NLP includes cancer-specific models trained on diverse oncology datasets, optimized for extracting tumor staging, treatment regimens, and progression metrics. These models are rigorously benchmarked to meet clinical precision needs, delivering reliable outputs that accelerate oncology workflows.

This shift from general-purpose NLP to precision-tuned models reflects a broader trend: clinical teams want models that speak the language of oncology. John Snow Labs offers Healthcare NLP pipelines fine-tuned for multiple cancer types, enhancing model performance and clinical trust.

Real-World Applications: From Risk Adjustment to Tumor Board Support

NLP and LLMs from John Snow Labs are already driving impact across oncology workflows:

Clinical Data Abstraction: Automating tumor staging and therapy extraction from radiology and pathology reports.
Risk Adjustment: Enhancing capture of HCC codes from oncology notes to improve Medicare Advantage accuracy.
Tumor Boards: Summarizing patient history and highlighting recent imaging or lab results.
Trial Matching: Identifying eligible patients from unstructured clinical narratives.

These capabilities reduce administrative burden and make actionable insights available in real time.

Looking Ahead: Trust, Transparency, and Tailoring Models

As adoption grows, oncology stakeholders must ensure that NLP and LLM models remain trustworthy and transparent. This involves:

Using explainable AI techniques to clarify model outputs.
Continuously retraining on validated oncology datasets.
Prioritizing compliance and auditability for clinical deployment with tools like Generative AI Lab.

The future of oncology insights lies not in replacing clinicians but in augmenting their work. By unlocking structured value from unstructured EHR data, Healthcare NLP and LLMs empower more informed decisions, timely interventions, and better patient outcomes.

To learn more about Healthcare NLP and its applications in oncology, explore our latest tools and case studies.

References

[1] H. Khela, J. Khalil, N. Daxon, Z. Neilson, T. Shahrokhi, P. Chung, and P. Wong, “Real world challenges in maintaining data integrity in electronic health records in a cancer program,” Journal of Biomedical Informatics, vol. 151, p. 104498, 2024. [Online]. Available: https://doi.org/10.1016/j.jbi.2024.104498

[2] Y. Li, Y.-H. Luo, J. A. Wampfler, S. M. Rubinstein, F. Tiryaki, K. A. V., J. L. Warner, H. Xu, and P. Yang, “Efficient and accurate extracting of unstructured EHRs on cancer therapy responses for the development of RECIST natural language processing tools: Part I, the corpus,” JCO Clinical Cancer Informatics, vol. no. 4, pp. 383–391, 2020. DOI: 10.1200/CCI.19.00147. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7265793/

State-of-the-Art Medical Language Models

Learn more

Julio Bonis

Data Scientist at John Snow Labs

Our additional expert:

Julio Bonis is a data scientist working on Healthcare NLP at John Snow Labs. Julio has broad experience in software development and design of complex data products within the scope of Real World Evidence (RWE) and Natural Language Processing (NLP). He also has substantial clinical and management experience – including entrepreneurship and Medical Affairs. Julio is a medical doctor specialized in Family Medicine (registered GP), has an Executive MBA – IESE, an MSc in Bioinformatics, and an MSc in Epidemiology.

How Can AI Help to Increase Patient Adherence through more Personalized Communication

Julio Bonis

Overview Patient adherence remains one of the toughest challenges in chronic disease management. Generic advice, like “eat healthier” or “exercise more”, often...

AI-Driven Oncology Insights: Unlocking Data from EHRs with NLP and LLMs

The Hidden Burden: EHR Documentation Errors in Oncology

Scaling Manual Abstraction: From Dozens to Thousands of Notes

Cancer-Specific NLP Models: From General AI to Precision Oncology

Real-World Applications: From Risk Adjustment to Tumor Board Support

Looking Ahead: Trust, Transparency, and Tailoring Models

How Can AI Help to Increase Patient Adherence through more Personalized Communication

Recommended For You