Spark NLP in Action

Open Source

Recognize entities in text
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.
Recognize more entities in text
Recognize over 18 entities such as Countries, People, Organizations, Products, Events, etc. using an out of the box pretrained NerDLApproach trained on the OntoNotes corpus.
Classify documents
Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.
Analyze sentiment in movie reviews and tweets
Detect the general sentiment expressed in a movie review or tweet by using our pretrained Spark NLP DL classifier.
Detect emotions in tweets
Automatically identify Joy, Surprise, Fear, Sadness in Tweets using out pretrained Spark NLP DL classifier.
Detect cyberbullying in tweets
Identify Racism, Sexism or Neutral tweets using our pretrained emotions detector.
Detect sarcastic tweets
Checkout our sarcasm detection pretrained Spark NLP model. It is able to tell apart normal content from sarcastic content.
Detect toxic comments
Classify comments and tweets into Toxic, Insults, Hate, Obscene, Threat.
Identify Fake news
Determine if news articles are Real of Fake.
Detect Spam messages
Automatically identify messages as being regular messages or Spam.
Find a text in document
Finds a text in document either by keyword or by regex expression.
Grammar analysis & Dependency Parsing
Visualize the syntactic structure of a sentence as a directed labeled graph where nodes are labeled with the part of speech tags and arrows contain the dependency tags.
Split and clean text
Spark NLP pretrained annotators allow an easy and straightforward processing of any type of text documents. This demo showcases our Sentence Detector, Tokenizer, Stemmer, Lemmatizer, Normalizer and Stop Words Removal.
Spell check your text documents
Spark NLP contextual spellchecker allows the quick identification of typos or spell issues within any text document.
Detect Key Phrases
Automatically detect key phrases in your text documents using out-of-the-box Spark NLP models.
Detect similar sentences
Automatically compute the similarity between two sentences using Spark NLP Universal Sentence Embeddings.
Detect toxic content in comments
Automatically detect identity hate, insult, obscene, severe toxic, threat or toxic content in SM comments using our out-of-the-box Spark NLP Multiclassifier DL.
Aspect based sentiment analysis for restaurants
Automatically detect positive, negative and neutral aspects about restaurants from the written feedback given by reviewers.
Detect sentences in text
Detect sentences from general purpose text documents using a deep learning model capable of understanding noisy sentence structures.
Detect and normalize dates
Automatically detect key phrases expressing dates and normalize them with respect to a reference date.

Languages

Detect language
Spark NLP Language Detector offers support for 20 different languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian
Recognize entities in English text
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.
Recognize entities in French text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_100d).
Recognize entities in German text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Recognize entities in Italian text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Recognize entities in Norwegian text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).
Recognize entities in Polish text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).
Recognize entities in Portuguese text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).
Recognize entities in Russian text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).
Recognize entities in Spanish text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).
Recognize entities in Danish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Recognize entities in Swedish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Recognize entities in Finnish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Prebuilt pipeline for entity recognition in Danish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Danish text.
Prebuilt pipeline for entity recognition in Swedish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Swedish text.
Prebuilt pipeline for entity recognition in Finnish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Finnish text.

Healthcare

Detect signs and symptoms
Automatically identify Signs and Symptoms in clinical documents using two of our pretrained Spark NLP clinical models.
Detect diagnosis and procedures
Automatically identify diagnoses and procedures in clinical documents using the pretrained Spark NLP clinical model ner_clinical.
Detect drugs and prescriptions
Automatically identify Drug, Dosage, Duration, Form, Frequency, Route, and Strength details in clinical documents using three of our pretrained Spark NLP clinical models.
Detect risk factors
Automatically identify risk factors such as Coronary artery disease, Diabetes, Family history, Hyperlipidemia, Hypertension, Medications, Obesity, PHI, Smoking habits in clinical documents using our pretrained Spark NLP model.
Detect anatomical references
Automatically identify Anatomical System, Cell, Cellular Component, Anatomical Structure, Immaterial Anatomical Entity, Multi-tissue Structure, Organ, Organism Subdivision, Organism Substance, Pathological Formation in clinical documents using our pretrained Spark NLP model.
Detect demographic information
Automatically identify demographic information such as Date, Doctor, Hospital, ID number, Medical record, Patient, Age, Profession, Organization, State, City, Country, Street, Username, Zip code, Phone number in clinical documents using three of our pretrained Spark NLP models.
Detect clinical events
Automatically identify a variety of clinical events such as Problems, Tests, Treatments, Admissions or Discharges, in clinical documents using two of our pretrained Spark NLP models.
Detect lab results
Automatically identify Lab test names and Lab results from clinical documents using our pretrained Spark NLP model.
Detect tumor characteristics
Automatically identify tumor characteristics such as Anatomical systems, Cancer, Cells, Cellular components, Genes and gene products, Multi-tissue structures, Organs, Organisms, Organism subdivisions, Simple chemicals, Tissues from clinical documents using our pretrained Spark NLP model.
Spell checking for clinical documents
Automatically identify from clinical documents using our pretrained Spark NLP model ner_bionlp.
Detect posology relations
Automatically identify relations between drugs, dosage, duration, frequency and strength using our pretrained clinical Relation Extraction (RE) model.
Detect causality between symptoms and treatment
Automatically identify relations between symptoms and treatment using our pretrained clinical Relation Extraction (RE) model.
Detect temporal relations for clinical events
Automatically identify three types of relations between clinical events: After, Before and Overlap using our pretrained clinical Relation Extraction (RE) model.
SNOMED coding
Automatically resolve the SNOMED code corresponding to the diseases and conditions mentioned in your health record using Spark NLP for Healthcare out of the box.
ICDO coding
Automatically detect the tumor in your healthcare records and link it to the corresponding ICDO code using Spark NLP for Healthcare out of the box.
ICD10-CM coding
Automatically detect the pre and post op diagnosis, signs and symptoms or other findings in your healthcare records and automatically link them to the corresponding ICD10-CM code using Spark NLP for Healthcare out of the box.
RxNORM coding
Automatically detect the drugs and treatments names mentioned in your prescription or healthcare records and link them to the corresponding RxNORM codes using Spark NLP for Healthcare out of the box.
Detect demographics and vital signs using rules
Automatically detect demographic information as well as vital signs using our out-of-the-box Spark NLP Contextual Rules. Custom rules are very easy to define and run on your own data.
Detect chemical compounds and genes
Automatically detect all chemical compounds and gene mentions using our pretrained chemprot model included in Spark NLP for Healthcare.
Detect genes and human phenotypes
Automatically detect mentions of genes and human phenotypes (hp) in medical text using Spark NLP for Healthcare pretrained models.
Detect normalized genes and human phenotypes
Automatically detect normalized mentions of genes (go) and human phenotypes (hp) in medical text using Spark NLP for Healthcare pretrained models.
ICD10 coding for German
Automatically detect the pre and post op diagnosis, signs and symptoms in your German healthcare records and automatically link them to the corresponding ICD10-CM code using Spark NLP for Healthcare out of the box.
Detect symptoms, treatments and other NERs in German
Automatically identify entities such as symptoms, diagnoses, procedures, body parts or medication in German clinical text using the pretrained Spark NLP clinical model ner_healthcare.
Detect legal entities German
Automatically identify entities such as persons, judges, lawyers, countries, cities, landscapes, organizations, courts, trademark laws, contracts, etc. in German legal text using the pretrained Spark NLP models ner_legal.
Adverse drug events tagger
Automatic pipeline that tags documents as containing or not containing adverse events description, then identifies those events.
Identify diagnosis and symptoms assertion status
Automatically detect if a diagnosis or a symptom is present, absent, uncertain or associated to other persons (e.g. family members).
Detect cell structure, DNA, RNA and protein
Automatically detect cell type, cell line, DNA and RNA information using our pretrained Spark NLP for Healthcare model.
Link entities to Wikipedia pages
Automatically disambiguate people’s names based on their context and link them to corresponding Wikipedia pages using out of the box Spark NLP pretrained models.
Detect sentences in healthcare documents
Automatically detect sentences in noisy healthcare documents with our pretrained Sentence Splitter DL model.

Spark OCR

PDF to Text
Extract text from generated/selectable PDF documents and keep the original structure of the document by using our out-of-the-box Spark OCR library.
DICOM to Text
Recognize text from DICOM format documents. This feature explores both to the text on the image and to the text from the metadata file.
Image to Text
Recognize text in images and scanned PDF documents by using our out-of-the-box Spark OCR library.
Remove background noise from scanned documents
Removing the background noise in a scanned document will highly improve the results of the OCR. Spark OCR is the only library that allows you to finetune the image preprocessing for excellent OCR results.
Correct skewness in scanned documents
Correct the skewness of your scanned documents will highly improve the results of the OCR. Spark OCR is the only library that allows you to finetune the image preprocessing for excellent OCR results.
Recognize text in natural scenes
By using image segmentation and preprocessing techniques Spark OCR recognizes and extracts text from natural scenes.
Recognize entities in scanned PDFs
End-to-end example of regular NER pipeline: import scanned images from cloud storage, preprocess them for improving their quality, recognize text using Spark OCR, correct the spelling mistakes for improving OCR results and finally run NER for extracting entities.
Extract tables from PDFs
Extract tables from selectable PDF documents with the new features offered by Spark OCR.

De-identification

Deidentify structured data
Deidentify PHI information from structured datasets using out of the box Spark NLP functionality that enforces GDPR and HIPPA compliance, while maintaining linkage of clinical data across files.
Deidentify free text documents
Deidentify free text documents by either masking or obfuscating PHI information using out of the box Spark NLP models that enforce GDPR and HIPPA compliance.
Deidentify DICOM documents
Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata.
De-identify PDF documents - HIPAA Compliance
De-identify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP models.
De-identify PDF documents - GDPR Compliance
De-identify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP models.