Electronic health records (EHRs) are the primary source of information for clinicians tracking the care of their patients. Due to innate obstacles in extracting information from unstructured text data and the high level of preciseness dictated in the healthcare domain, manual data abstraction has been prevalent in the industry.
Despite several efforts towards using Machine Learning (ML) in information extraction from EHRs, a deeper information extraction process where we can understand not only ‘what’ but also ‘how’ and ‘why’ has been limited and drawing the high-level picture of an entire journey of a patient across multiple documents through years has also been practically impossible.
In this talk, Veysel presents an end-to-end clinical document parsing pipeline which is using state-of-the-art named entity recognition (NER), text classification, assertion status detection, and relation classification models, all empowered by Spark NLP library and deployed in a Kubernetes cluster that is capable of serving run-time requests over Rest APIs as well as capable of parsing large volume of documents in Apache Spark cluster.
This system has already been deployed in a hospital setting and saved hundreds of thousands of manual abstraction hours so far.