Medical Data
De-identification

Automatically de-identify structured data, unstructured data, documents, PDF files, and images in compliance with HIPAA, GDPR, or custom needs
>99% Accuracy on real-world document

How Providence Health De-Identified 700 Million Patient Notes with Spark NLP

Accuracy:
99.19that is, above human accuracy
Performance:

2.46hours

to de-identify 500K patient notes.

Explore a full range or features

John Snow Labs’ De-identification solutions AWS Medical Comprehend Microsoft Presidio Google DLP
De-dentification tool
End-to-end service
Available also as a standalone library
Established new state of the art accuracy in peer reviewed publication
Real world reference with >99% correctly recognized PHI
Scanned PDF Integrated Separate service Separate service
DICOM Integrated Separate service Separate service
Obfuscation
Multilingual support
Built on big data framework
Possible to fine tune standard pre-trained models
Data does not leave your premise
Works in air gap insulated server with no internet access
  • Entities available out of box:
    ACCOUNT, AGE, BIOID, CITY, CONTACT, COUNTRY, DATE. DEVICE, DLN, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, ID, IDNUM, IPADDR, LICENSE, LOCATION, LOCATION-OTHER, MEDICALRECORD, NAME, ORGANIZATION, PATIENT, PHONE, PLATE, PROFESSION, SSN, STREET, STATE, URL, USERNAME, VIN, ZIP
  • Easy to add other entities.
  • Works with virtually any input – text, scanned PDF, DICOM, docx, pptx.

The De-identification Service

1
Analyze
Human
  • Risk analysis​
  • Legal requirements review
  • HIPAA Safe Harbor, HIPAA Expert Determination​
  • CCPA​
  • GDPR pseudoanonymization, GDPR anonymization
  • Quality assurance strategy & process
Receive raw data
2
Identify
Software
  • ID, name, email, patient ID, SSN, credit card, address, birthday, phone, URL, license number
  • Physician name, hospital name, profession, employer, affiliation
  • Racial or ethnic origin, religion, political or union affiliation, biometric or genetic data, sexual practice or orientation
3
Measure
Human
  • Cleanroom AI Platform (on-site)
  • Annotation tool
  • Active learning
  • Accuracy Measurement & agreement processes
  • Correct sampling
  • Multi-lingual
4
De-identify
Software
We support:
  • Tabular (headers, values)
  • Text (NER, text matching)
  • PDF: Text or Scanned
  • Images (OCR & metadata)
  • DICOM (OCR & metadata)
So you can:
  • Replace (or delete a field)
  • Mask (hash identifiers or shift dates)
  • Obfuscate (name, locations, organizations)
  • Generalize (disease codes, dates, addresses)
Deliver de-identified data
5
Monitor
Human
  • Ongoing measurement & model improvement
  • Missed sensitive data
  • Incident response
  • GDPR & CCPA requests
  • Emergency unblinding
  • Audits

Recognized by the Technology Experts