Full-service Medical Data De-identification

Software + Human Expertise

Best-in-class software to identify, mask, obfuscate, and generalize sensitive data

Working together with human domain experts to ensure high accuracy

State of the art Accuracy

Build on Spark NLP for Healthcare

De-identify medical data with award-winning, publicly benchmarked models

End-to-end
White-glove Service

Full service from needs analysis to operational support

De-identify all medical data: Structured tables, free text, DICOM, scanned PDF’s, and pathology images

The De-identification Service

1
Analyze
Human
  • HIPPA Safe Harbor, HIPAA Expert Determination
  • CCPA
  • GDPR pseudonymization, GDPR anonymization
Receive raw data
2
Identify
Software
  • ID, name, email, patient ID, SSN, credit card, address, birthday, phone, URL, license number
  • Physician name, hospital name, profession, employer, affiliation
  • Racial or ethnic origin, religion, political or union affiliation, biometric or genetic data, sexual practice or orientation
3
Measure
Human
  • Cleanroom AI Platform (on-site)
  • Annotation tool
  • Active learning
  • Accuracy Measurement & agreement processes
  • Correct sampling
  • Multi-lingual
4
De-identify
Software
We support:
  • Tabular (headers, values)
  • Text (NER, text matching)
  • PDF: Text or Scanned
  • Images (OCR & metadata)
  • DICOM (OCR & metadata)
So you can:
  • Replace (or delete a field)
  • Mask (hash identifiers or shift dates)
  • Obfuscate (name, locations, organizations)
  • Generalize (disease codes, dates, addresses)
Deliver de-identified data
5
Monitor
Human
  • Ongoing measurement & model improvement
  • Missed sensitive data
  • Incident response
  • GDPR & CCPA requests
  • Emergency unblinding
  • Audits

De-identification in Action

Deidentify
structured data

Deidentify Protected Health Information (PHI) from structured datasets automatically while enforcing GDPR and HIPPA compliance and maintaining linkage of clinical data across files.

Deidentify free text
documents

Deidentify free text documents by either masking or obfuscating PHI using out-of-the-box, high-accuracy Spark NLP for Healthcare models.

Deidentify DICOM
documents

Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata.

De-identify PDF documents – HIPAA Compliance

Deidentify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP and Spark OCR models.

De-identify PDF documents – GDPR Compliance

Deidentify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP and Spark OCR models.

De-identification Webinars