Open Source
Recognize entities in text
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.
Recognize more entities in text
Recognize over 18 entities such as Countries, People, Organizations, Products, Events, etc. using an out of the box pretrained NerDLApproach trained on the OntoNotes corpus.
Classify documents
Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.
Analyze sentiment in movie reviews and tweets
Detect the general sentiment expressed in a movie review or tweet by using our pretrained Spark NLP DL classifier.
Detect emotions in tweets
Automatically identify Joy, Surprise, Fear, Sadness in Tweets using out pretrained Spark NLP DL classifier.
Detect cyberbullying in tweets
Identify Racism, Sexism or Neutral tweets using our pretrained emotions detector.
Detect sarcastic tweets
Checkout our sarcasm detection pretrained Spark NLP model. It is able to tell apart normal content from sarcastic content.
Detect Spam messages
Automatically identify messages as being regular messages or Spam.
Find a text in document
Finds a text in document either by keyword or by regex expression.
Grammar analysis & Dependency Parsing
Visualize the syntactic structure of a sentence as a directed labeled graph where nodes are labeled with the part of speech tags and arrows contain the dependency tags.
Split and clean text
Spark NLP pretrained annotators allow an easy and straightforward processing of any type of text documents. This demo showcases our Sentence Detector, Tokenizer, Stemmer, Lemmatizer, Normalizer and Stop Words Removal.
Spell check your text documents
Spark NLP contextual spellchecker allows the quick identification of typos or spell issues within any text document.
Detect Key Phrases
Automatically detect key phrases in your text documents using out-of-the-box Spark NLP models.
Detect similar sentences
Automatically compute the similarity between two sentences using Spark NLP Universal Sentence Embeddings.
Detect toxic content in comments
Automatically detect identity hate, insult, obscene, severe toxic, threat or toxic content in SM comments using our out-of-the-box Spark NLP Multiclassifier DL.
Aspect based sentiment analysis for restaurants
Automatically detect positive, negative and neutral aspects about restaurants from the written feedback given by reviewers.
Detect sentences in text
Detect sentences from general purpose text documents using a deep learning model capable of understanding noisy sentence structures.
Detect and normalize dates
Automatically detect key phrases expressing dates and normalize them with respect to a reference date.
Understand questions about Airline Traffic
Automatically detect key entities related to airline traffic, such as departure and arrival times and locations.
Automatically answer questions
Automatically generate answers to questions with & without context
Infer word meaning from context
Compare the meaning of words in two different sentences and evaluate ambiguous pronouns.
Assess relationship between two sentences
Evaluate the relationship between two sentences or text fragments to identify things such as contradictions, entailments and premises & hypotheses
Evaluate sentence grammar
Classify a sentence as grammatically correct or incorrect.
Understand intent and actions in general commands
Extract intents in general commands related to music, restaurants, movies.
Languages
Detect language
Spark NLP Language Detector offers support for 20 different languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian


Recognize entities in English text
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.
Recognize entities in French text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_100d).
Recognize entities in German text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Recognize entities in Italian text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).


Recognize entities in Norwegian text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).


Recognize entities in Polish text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).


Recognize entities in Portuguese text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).


Recognize entities in Russian text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).


Recognize entities in Spanish text
Recognize Persons, Locations, Organizations and Misc entities using 3 different out of the box pretrained Deep Learning models based on different GloVe word embeddings (glove_100d & glove_300d).


Recognize entities in Danish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).


Recognize entities in Swedish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).


Recognize entities in Finnish text
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_300d).
Prebuilt pipeline for entity recognition in Danish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Danish text.
Prebuilt pipeline for entity recognition in Swedish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Swedish text.
Prebuilt pipeline for entity recognition in Finnish
This SparkNLP out-of-the-box pipeline returns tokens, lemmas, pos, embeddings and NERs in one line of code. It automatically recognizes Persons, Locations, Organizations and Misc entities in Finnish text.


Recognize entities in Turkish text
Recognize Persons, Locations and Organization entities using an out of the box pretrained Deep Learning model and multi-lingual Bert word embeddings.






Recognize entities in Arabic text
Recognize Persons, Locations and Organization entities using an out of the box pretrained Deep Learning model and language specific embeddings.






Recognize entities in Persian text
Recognize Persons, Locations and Organization entities using an out of the box pretrained Deep Learning model and language specific embeddings.






Recognize entities in Hebrew text
Recognize Persons, Locations and Organization entities using an out of the box pretrained Deep Learning model and language specific embeddings.






Recognize entities in Japanese text
Recognize Persons, Locations and Organization entities using an out of the box pretrained Deep Learning model and language specific embeddings.










Recognize entities in Urdu text
Recognize Persons, Locations and other entities using an out of the box pretrained Deep Learning model and language specific embeddings.






Recognize entities in Korean text
Recognize Persons, Locations and other entities using an out of the box pretrained Deep Learning model and language specific embeddings.






Recognize entities in Chinese text
Recognize Persons, Locations and other entities using an out of the box pretrained Deep Learning model and language specific embeddings.










Analyze sentiment in Urdu movie reviews
Detect the general sentiment expressed in a movie review or tweet by using our pretrained Spark NLP sentiment analysis model for Urdu language.
Translate text in more than 192 languages
Translate text in more than 192 languages using pretrained Deep Learning models.






Recognize Entities in Bengali
Recognize Persons, Locations, Organizations and Misc entities using an out of the box pretrained Deep Learning model and GloVe word embeddings (glove_840b_300d).






Translate Entities in Hausa
Translate Hausa to English using our pre-trained model.






Translate Entities in Swahili
Translate Swahili to English using our pre-trained model.






Translate Entities in Afrikaans
Translate Afrikaans to English using our pre-trained model.






Turkish News Classifier
Classify Turkish news text using our pre-trained model
Healthcare
Detect signs and symptoms
Automatically identify Signs and Symptoms in clinical documents using two of our pretrained Spark NLP clinical models.
Detect diagnosis and procedures
Automatically identify diagnoses and procedures in clinical documents using the pretrained Spark NLP clinical model ner_clinical.
Detect drugs and prescriptions
Automatically identify Drug, Dosage, Duration, Form, Frequency, Route, and Strength details in clinical documents using three of our pretrained Spark NLP clinical models.
Detect risk factors
Automatically identify risk factors such as Coronary artery disease, Diabetes, Family history, Hyperlipidemia, Hypertension, Medications, Obesity, PHI, Smoking habits in clinical documents using our pretrained Spark NLP model.
Detect anatomical references
Automatically identify Anatomical System, Cell, Cellular Component, Anatomical Structure, Immaterial Anatomical Entity, Multi-tissue Structure, Organ, Organism Subdivision, Organism Substance, Pathological Formation in clinical documents using our pretrained Spark NLP model.
Detect demographic information
Automatically identify demographic information such as Date, Doctor, Hospital, ID number, Medical record, Patient, Age, Profession, Organization, State, City, Country, Street, Username, Zip code, Phone number in clinical documents using three of our pretrained Spark NLP models.
Detect clinical events
Automatically identify a variety of clinical events such as Problems, Tests, Treatments, Admissions or Discharges, in clinical documents using two of our pretrained Spark NLP models.
Detect lab results
Automatically identify Lab test names and Lab results from clinical documents using our pretrained Spark NLP model.
Detect tumor characteristics
Automatically identify tumor characteristics such as Anatomical systems, Cancer, Cells, Cellular components, Genes and gene products, Multi-tissue structures, Organs, Organisms, Organism subdivisions, Simple chemicals, Tissues from clinical documents using our pretrained Spark NLP model.
Spell checking for clinical documents
Automatically identify from clinical documents using our pretrained Spark NLP model ner_bionlp.
Detect posology relations
Automatically identify relations between drugs, dosage, duration, frequency and strength using our pretrained clinical Relation Extraction (RE) model.
Detect causality between symptoms and treatment
Automatically identify relations between symptoms and treatment using our pretrained clinical Relation Extraction (RE) model.
Detect temporal relations for clinical events
Automatically identify three types of relations between clinical events: After, Before and Overlap using our pretrained clinical Relation Extraction (RE) model.
SNOMED coding
Automatically resolve the SNOMED code corresponding to the diseases and conditions mentioned in your health record using Spark NLP for Healthcare out of the box.
ICDO coding
Automatically detect the tumor in your healthcare records and link it to the corresponding ICDO code using Spark NLP for Healthcare out of the box.
ICD10-CM coding
Automatically detect the pre and post op diagnosis, signs and symptoms or other findings in your healthcare records and automatically link them to the corresponding ICD10-CM code using Spark NLP for Healthcare out of the box.
RxNORM coding
Automatically detect the drugs and treatments names mentioned in your prescription or healthcare records and link them to the corresponding RxNORM codes using Spark NLP for Healthcare out of the box.
Detect demographics and vital signs using rules
Automatically detect demographic information as well as vital signs using our out-of-the-box Spark NLP Contextual Rules. Custom rules are very easy to define and run on your own data.
Detect chemical compounds and genes
Automatically detect all chemical compounds and gene mentions using our pretrained chemprot model included in Spark NLP for Healthcare.
Detect genes and human phenotypes
Automatically detect mentions of genes and human phenotypes (hp) in medical text using Spark NLP for Healthcare pretrained models.
Detect normalized genes and human phenotypes
Automatically detect normalized mentions of genes (go) and human phenotypes (hp) in medical text using Spark NLP for Healthcare pretrained models.
ICD10 coding for German
Automatically detect the pre and post op diagnosis, signs and symptoms in your German healthcare records and automatically link them to the corresponding ICD10-CM code using Spark NLP for Healthcare out of the box.
Detect symptoms, treatments and other NERs in German
Automatically identify entities such as symptoms, diagnoses, procedures, body parts or medication in German clinical text using the pretrained Spark NLP clinical model ner_healthcare.
Detect legal entities German
Automatically identify entities such as persons, judges, lawyers, countries, cities, landscapes, organizations, courts, trademark laws, contracts, etc. in German legal text using the pretrained Spark NLP models ner_legal.
Adverse drug events tagger
Automatic pipeline that tags documents as containing or not containing adverse events description, then identifies those events.
Identify diagnosis and symptoms assertion status
Automatically detect if a diagnosis or a symptom is present, absent, uncertain or associated to other persons (e.g. family members).
Detect cell structure, DNA, RNA and protein
Automatically detect cell type, cell line, DNA and RNA information using our pretrained Spark NLP for Healthcare model.
Link entities to Wikipedia pages
Automatically disambiguate people’s names based on their context and link them to corresponding Wikipedia pages using out of the box Spark NLP pretrained models.
Detect sentences in healthcare documents
Automatically detect sentences in noisy healthcare documents with our pretrained Sentence Splitter DL model.
Classify medical text according to PICO framework
Automatically classify medical text in PICO components: Participants/Problem, Intervention, Comparison, and Outcome.
Detect chemical compounds
Automatically detect all types of chemical compounds using our pretrained Spark NLP for Healthcare model.
Detect bacteria, plants, animals or general species
Automatically detect bacteria, plants, animals, and other species using our pretrained Spark NLP for Healthcare model.
Generate SQL queries from natural language
Automatically generate valid SQL queries from raw text using our unique DL generative model.
Detect traffic information in text
Automatically extract geographical location, postal codes, and traffic routes in German text using our pretrained Spark NLP model.
Identify gender using context and medical records
Identify gender of a person by analyzing signs and symptoms using pretrained Spark NLP Classification model.
Detect clinical entities in text
Automatically detect more than 50 clinical entities using our NER deep learning model.
Detect Clinical Entities in Radiology Reports
Automatically identify entities such as body parts, imaging tests, imaging results and diseases using a pre-trained Spark NLP model.
Normalize Medication-related Phrases
Normalize medication-related phrases such as dosage, form and strength, as well as abbreviations in text and named entities extracted by NER models.
Detect relations between body parts and clinical entities
Use pre-trained relation extraction models to extract relations between body parts and clinical entities.
Detect Diagnoses And Procedures In Spanish
Automatically identify diagnoses and procedures in Spanish clinical documents using the pre-trained Spark NLP clinical model.
Logical Observation Identifiers Names and Codes (LOINC)
Map clinical NER entities to Logical Observation Identifiers Names and Codes (LOINC) using our pre-trained model.
Map Healthcare Codes
These pretrained pipelines map various codes (e.g., ICD10CM codes to SNOMED codes) without using any text data.
Spark OCR
PDF to Text
Extract text from generated/selectable PDF documents and keep the original structure of the document by using our out-of-the-box Spark OCR library.
DICOM to Text
Recognize text from DICOM format documents. This feature explores both to the text on the image and to the text from the metadata file.
Image to Text
Recognize text in images and scanned PDF documents by using our out-of-the-box Spark OCR library.
Remove background noise from scanned documents
Removing the background noise in a scanned document will highly improve the results of the OCR. Spark OCR is the only library that allows you to finetune the image preprocessing for excellent OCR results.
Correct skewness in scanned documents
Correct the skewness of your scanned documents will highly improve the results of the OCR. Spark OCR is the only library that allows you to finetune the image preprocessing for excellent OCR results.
Recognize text in natural scenes
By using image segmentation and preprocessing techniques Spark OCR recognizes and extracts text from natural scenes.
Recognize entities in scanned PDFs
End-to-end example of regular NER pipeline: import scanned images from cloud storage, preprocess them for improving their quality, recognize text using Spark OCR, correct the spelling mistakes for improving OCR results and finally run NER for extracting entities.
Extract tables
Extract tables from selectable PDF documents with the new features offered by Spark OCR.
Extract Data from FoundationOne Sequencing Reports
Use our transformer to parse patient info, genomic and biomarker findings, and gene lists.
Enhance Faxes or Scanned Documents
Improve quality of (old) faxes/scanned documents using Spark OCR.
Enhance Photo of Documents
Improve quality of documents in image format using Spark OCR.
PDF to Text (Non-English Text)
Extract non-English text from generated/selectable PDF documents and keep the original structure of the document by using our out-of-the-box Spark OCR library.
Image to Text (Non-English Text)
Recognize non-English text in images and scanned PDF documents by using our out-of-the-box Spark OCR library.
DOCX to Text (Non-English Text)
Extract non-English text from Word documents using out out-of-the-box Spark OCR library.
De-identification
Deidentify structured data
Deidentify PHI information from structured datasets using out of the box Spark NLP functionality that enforces GDPR and HIPPA compliance, while maintaining linkage of clinical data across files.
Deidentify free text documents
Deidentify free text documents by either masking or obfuscating PHI information using out of the box Spark NLP models that enforce GDPR and HIPPA compliance.
Deidentify DICOM documents
Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata.
De-identify PDF documents - HIPAA Compliance
De-identify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP models.
De-identify PDF documents - GDPR Compliance
De-identify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP models.