Being updated about the best practices related to the care of patients is currently a challenge for clinicians. Many healthcare organizations develop internal clinical guidelines with the aim of harmonizing decision making and increase healthcare standards.
In many cases, those documents are not available over the Internet, and in addition are stored in formats not easily indexable such as PDF, PPTX, or DOCX.
However, access to the relevant information at the point of care for decision making contained in those guidelines is not easy, so a system able to localize that information quickly in a convenient way for busy clinicians could help to increase the adherence to those guidelines.
In this session, Julio will go through the design of a system able to ingest documents from multiple formats with the help of Spark OCR library, represent that corpus with the help of specific domain embeddings, using Spark NLP for Healthcare pretrained models, and provide with very low latency the information relevant to answer a clinical question expressed in natural language.
The explosive growth of scientific research about the novel coronavirus is one of the truly inspiring and hope-filled stories of this crisis – but it’s also a story of overwhelming data volume ringed by confusion and division. We’re seeing this play out in almost every domain. With the exponential growth of information and data in the world, we drown in the data that should inform us – distracted and diverted.
We built covid19primer.com to use NLP to make the coronavirus scientific literature accessible – adding structure, and connecting news and social conversations to provide context.
We’re also applying this approach to other domains, addressing the needs of analysts, government leaders, military commanders, and researchers alike. What have we learned so far, and how else can AI help address information overload?
Unstructured free-text medical notes are the only source for many critical facts in healthcare. As a result, accurate natural language processing is a critical component of many healthcare AI applications like clinical decision support, clinical pathway recommendation, cohort selection, patient risk or abnormality detection. Recent advances in deep learning for NLP have enabled a new level of accuracy and scalability for clinical language understanding making a broad set of applications possible for the first time. The first part of this talk will cover the deep learning techniques, explain-ability features, and NLP pipeline architecture that has been applied. We’ll provide a short overview of the key underlying technologies: Spark NLP for Healthcare, BERT embeddings, and healthcare-specific embeddings. Then, we’ll describe how these were applied to tackle the challenges of a healthcare setting: understanding clinical terminology, extracting specialty-specific facts of interest, and using transfer learning to minimize the required amount of task-specific annotation. The use of MLflow and its integration with Spark NLP to track experiments and reproduce results will also be covered. The second part of the talk will cover automated deep learning: the system’s ability to train, tune and measure models once clinical annotators add or correct labeled data. We will cover the annotation process and guidelines; why automation was required to handle the variety in clinical language across providers, document types, and geographies; and how this works in practice. Providing explainable results – including highlighting evidence in the text for extracted semantic facts – is another critical business requirement that we’ll show how we’ve addressed. This talk is intended for data scientists, software engineers, architects and leaders who must design real-world clinical AI applications and are interested in lessons learned applying the latest advances in NLP and deep learning in this space.