Best-in-class software to identify, mask, obfuscate, and generalize sensitive data
Working together with human domain experts to ensure high accuracy
Build on Spark NLP for Healthcare
De-identify medical data with award-winning, publicly benchmarked models
Full service from needs analysis to operational support
De-identify all medical data: Structured tables, free text, DICOM, scanned PDF’s, and pathology images
The De-identification Service
- HIPAA Safe Harbor, HIPAA Expert Determination
- GDPR pseudonymization, GDPR anonymization
- ID, name, email, patient ID, SSN, credit card, address, birthday, phone, URL, license number
- Physician name, hospital name, profession, employer, affiliation
- Racial or ethnic origin, religion, political or union affiliation, biometric or genetic data, sexual practice or orientation
- Cleanroom AI Platform (on-site)
- Annotation tool
- Active learning
- Accuracy Measurement & agreement processes
- Correct sampling
- Tabular (headers, values)
- Text (NER, text matching)
- PDF: Text or Scanned
- Images (OCR & metadata)
- DICOM (OCR & metadata)
- Replace (or delete a field)
- Mask (hash identifiers or shift dates)
- Obfuscate (name, locations, organizations)
- Generalize (disease codes, dates, addresses)
- Ongoing measurement & model improvement
- Missed sensitive data
- Incident response
- GDPR & CCPA requests
- Emergency unblinding
De-identification in Action
Deidentify Protected Health Information (PHI) from structured datasets automatically while enforcing GDPR and HIPAA compliance and maintaining linkage of clinical data across files.
Deidentify free text documents by either masking or obfuscating PHI using out-of-the-box, high-accuracy Spark NLP for Healthcare models.
Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata.
Deidentify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP and Spark OCR models.
Deidentify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP and Spark OCR models.