Different data sources such as structured data, clinical notes, laboratory measurements capture information about the human body at different time-scales.
There is no large-scale study that recognizes the multi-scale nature of these multi-modal data sources and demonstrates the value of such integration.
While the value for natural language clinical notes have been long recognized by the medical informatics community, their integration has always been difficult due to inherent technical challenges with extending core natural language processing techniques such as entity and relation extraction to domain specific problems.
This talk will present a novel approach for such integration where we extract information from natural language clinical notes into a structured data realm via Spark NLP and then demonstrate the impact of such integration on prediction performance via a self-supervised graph transformer approach.