Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

John Snow Labs for Academia

You have our full support for using John Snow Lab’s Healthcare NLP, Visual NLP, and the Healthcare Data Library for open research & teaching projects.

This includes over 25,000 pre-trained models as well as the entire catalogue of over 2,300 expert-curated datasets in its Data Library.

Healthcare NLP

gives you access to state-of-the-art:
Clinical named entity
Clinical named entity

recognitionIngestion & Preparation – train your own or use pre-trained models to extract clinical facts (symptoms, diagnoses, treatments, procedures), drug facts (name, strength, dosage, route, frequency, duration), and biomedical terms (organism, tissue, gene, gene product, chemical, …).

Assertion status detection
Assertion status detection

telling between positive assertions (“patient has diabetes”), negative assertions (“no fever”), uncertain assertions (“shows indications of depression”), or assertions about other people (“family history of lung cancer”).

Entity resolution
Entity resolution

train your own or use pre-trained models to resolve recognized entities to SNOMED-CT, ICD-10-CM, ICD-10-PCS, CPT, or RxNorm.

Relation extraction
Relation extraction

use pre-trained models to automatically identify relations between entities such as drugs, dosage, duration, frequency, clinical events among many others.

Medical data normalization
Medical data normalization

normalize medications, lab results, vital signs, and demographic data – to simplify downstream analysis for extracted clinical information.


Anonymize either structured tables or unstructured free text including all GDPR and HIPAA-required fields as well as and then either remove, mask, or obfuscate PHI.

Spark Visual NLP

allows you to accurately transform PDF, DOCX, DICOM, and image files to digital text with built-in algorithms for:

  • image pre-processing (binarization, thresholding, erosion, scaling, skew correction)
  • image cleansing (noise scorer, remove objects, morphology), and
  • handling of complex document layouts (LayoutAnalyzer, SplitRegions, DrawRegions, PositionFinder).

Each dataset goes through 3 levels of quality review

Data is normalized into one unified type system

Data and Metadata

Always up to date

The Data Library

includes over 2,200 expert-curated datasets that are ready to download and use on your academic/research project:

Our company is named after Dr. John Snow – the medical doctor who helped stop the outbreak of cholera in 1854 London by analyzing data.

We exist for the very purpose of empowering many more like him in the 21st century.

Dr. John Snow