Dandelion Health is a provider of multimodal, longitudinal clinical data for healthcare innovators. This session shows how it built a de-identification process for free-text clinical notes, with John Snow Labs’ Healthcare NLP & LLM at its core. This process maintains patient privacy, minimizes risks for hospital systems, and preserves the bulk of free-text notes to provide researchers with high fidelity clinical data.Dandelion Health partners with hospital systems, deidentifies their clinical data in their environment, and then copies this data to the Dandelion data lake so that customers can perform research and validation within the secure Dandelion platform. To ensure HIPAA compliance, deidentification requires an expert determination to confirm that minimal protected health information (PHI) remains after the process.Tabular data is straight-forward to handle by removing or masking data fields with PHI related values – such as patient names, birth dates, addresses, or contact details. Free text patient notes are much more difficult to automatically deidentify, as this requires PHI words and phrases to be redacted or masked, after which the whole of the patient note must be verified.Key topics of the presentation include:1. Breaking down different note types (e.g. radiology reports, pathology reports, echo narratives, progress notes) according to level of risk, and adapting the de-id process accordingly.2. Assessing note subtypes (e.g. radiology reports for DEXA scans, or fetal radiology reports) in order to carve out exceptions to our standard process (e.g. unique note structure, or age formats such as “27w” that need to be redacted).3. Determining the importance of recall, precision, and PHI frequency for quasi-identifiers.4. Applying pre-processing or enhancements such as HIPS (hiding in plain sight) to reduce risk based on the recall, precision, and frequency of PHI in free-text notes. This presentation features real-world case-studies and examples, demonstrating the power of: validating clinician data-quality hypotheses with language models, using different NLP & LLM strategies for different datasets, and letting QA/QC statistics tell the story – so we know that we’re doing right by the patient.
Dandelion Health is a provider of multimodal, longitudinal clinical data for healthcare innovators. This session shows how it built a de-identification process for free-text clinical notes, with John Snow Labs’...
Overall, de-identification in today’s data-driven world is a critical practice that helps balance the benefits of AI and big data with the need for privacy and compliance, facilitating both technological...
Redesign of embedding models Recent developments in NLP rely on vector representations of text, commonly known as embeddings. To support the utilization, training, and fine-tuning of models for the legal...
Start to work with DICOM in Visual NLP In this post, we are deeply diving into working with metadata using Visual NLP. We are going to make use of Visual...
Introduction In the world of healthcare and medical research, the ability to access and share medical images is crucial for diagnosis, treatment, and scientific investigation. However, these images often contain...