Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

Explain Document DL – SPARK NLP Pretrained Pipeline

Spark NLP Short Blogpost Series:1

Spark NLP offers several pre-trained models in four languages (English, French, German, Italian) and all you need to do is to load the pre-trained model into your disk by specifying the model name and then configuring the model parameters as per your use case and dataset. Then you will not need to worry about training a new model from scratch and will be able to enjoy the pre-trained SOTA algorithms directly applied to your own data with transform().

Let’s see how we can use explain_document_dl pre-trained model in Python.

We start by importing the required modules.

Now, we load a pipeline model that contains the following annotators as a default:

  • Tokenizer
  • Deep Sentence Detector
  • Lemmatizer
  • Stemmer
  • Part of Speech (POS)
  • Context Spell Checker (NorvigSweetingModel)
  • Word Embeddings (glove)
  • NER-DL (trained by SOTA algorithm)

We simply send the text we want to transform and the pipeline does the work.

As you can see, we have misspelled two words: beautful and pictre

We can see the output of each annotator below. This one is doing so many things at once!

As you can see, the misspelled words are also fixed correctly.

We hope that you already read the previous articles on our official Medium page, and started to play with Spark NLP. Here are the links for the other articles. Don’t forget to follow our page and stay tuned!

Here is the notebook for the codes shared above.

Introduction to Spark NLP: Foundations and Basic Components (Part-I)

Introduction to: Spark NLP: Installation and Getting Started (Part-II)

Spark NLP 101 : Document Assembler

Spark NLP 101: LightPipeline

Introducing Spark NLP: Basic components and underlying technologies (Part-III)

Since Spark NLP is sitting on the shoulders of Apache Spark, it’s better to explain Spark NLP components with a reference to...