Full-service Medical Data De-identification
The De-identification Service
- Risk analysis
- Legal requirements review
- HIPAA Safe Harbor, HIPAA Expert Determination
- CCPA
- GDPR pseudoanonymization, GDPR anonymization
- Quality assurance strategy & process
- ID, name, email, patient ID, SSN, credit card, address, birthday, phone, URL, license number
- Physician name, hospital name, profession, employer, affiliation
- Racial or ethnic origin, religion, political or union affiliation, biometric or genetic data, sexual practice or orientation
- Cleanroom AI Platform (on-site)
- Annotation tool
- Active learning
- Accuracy Measurement & agreement processes
- Correct sampling
- Multi-lingual
- Tabular (headers, values)
- Text (NER, text matching)
- PDF: Text or Scanned
- Images (OCR & metadata)
- DICOM (OCR & metadata)
- Replace (or delete a field)
- Mask (hash identifiers or shift dates)
- Obfuscate (name, locations, organizations)
- Generalize (disease codes, dates, addresses)
- Ongoing measurement & model improvement
- Missed sensitive data
- Incident response
- GDPR & CCPA requests
- Emergency unblinding
- Audits
Full range of features
John Snow Labs’ De-identification solutions | AWS Medical Comprehend | Microsoft Presidio | Google DLP | |
---|---|---|---|---|
De-dentification tool | ||||
End-to-end service | ||||
Available also as a standalone library | ||||
Established new state of the art accuracy in peer reviewed publication | ||||
Real world reference with >99% correctly recognized PHI | ||||
Scanned PDF | Integrated | Separate service | Separate service | |
DICOM | Integrated | Separate service | Separate service | |
Obfuscation | ||||
Multilingual support | ||||
Built on big data framework | ||||
Possible to fine tune standard pre-trained models | ||||
Data does not leave your premise | ||||
Works in air gap insulated server with no internet access |
- Entities available out of box:
ACCOUNT, AGE, BIOID, CITY, CONTACT, COUNTRY, DATE. DEVICE, DLN, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, ID, IDNUM, IPADDR, LICENSE, LOCATION, LOCATION-OTHER, MEDICALRECORD, NAME, ORGANIZATION, PATIENT, PHONE, PLATE, PROFESSION, SSN, STREET, STATE, URL, USERNAME, VIN, ZIP
- Easy to add other entities.
- Works with virtually any input – text, scanned PDF, DICOM, docx, pptx.
De-identification in Action
structured data
Deidentify Protected Health Information (PHI) from structured datasets automatically while enforcing GDPR and HIPAA compliance and maintaining linkage of clinical data across files.








documents
Deidentify free text documents by either masking or obfuscating PHI using out-of-the-box, high-accuracy Spark NLP for Healthcare models.






documents
Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata.








Deidentify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP and Spark OCR models.






Deidentify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP and Spark OCR models.






