was successfully added to your cart.

AnnouncementNatural Language Processing

Intent and Action Classification, analyze Chinese News and Crypto market, 200+ languages & answer questions with NLU 1.1.3

We are very excited to announce that the latest NLU release comes with a new pretrained Intent Classifier and NER Action Extractor for text related to
music, restaurants, and movies trained on the SNIPS dataset. Make sure to check out the models’ hub and the easy 1-liners for more info!

In addition to that, new NER and Embedding models for Bengali are now available

Finally, there is a new NLU Webinar with 9 accompanying tutorial notebooks which teach you a lot of things and is segmented into the following parts:

  • Part1: Easy 1 Liners
    • Spell checking/Sentiment/POS/NER/ BERTtology embeddings
  • Part2: Data analysis and NLP tasks on Crypto News Headline dataset
    • Preprocessing and extracting Emotions, Keywords, Named Entities and visualize them
  • Part3: NLU Multi-Lingual 1 Liners with Microsoft’s Marian Models
    • Translate between 200+ languages (and classify lang afterward)
  • Part 4: Data analysis and NLP tasks on Chinese News Article Dataset
    • Word Segmentation, Lemmatization, Extract Keywords, Named Entities and translate to English
  • Part 5: Train a sentiment Classifier that understands 100+ Languages
  • Part 6: Question answering, Summarization, Squad, and more with Google’s T5

New Models

NLU 1.1.3 New Non-English Models

Language nlu.load() reference Spark NLP Model reference Type
Bengali bn.ner.cc_300d bengaliner_cc_300d NerDLModel
Bengali bn.embed bengali_cc_300d NerDLModel
Bengali bn.embed.cc_300d bengali_cc_300d Word Embeddings Model (Alias)
Bengali bn.embed.glove bengali_cc_300d Word Embeddings Model (Alias)

NLU 1.1.3 New English Models

Language nlu.load() reference Spark NLP Model reference Type
English en.classify.snips nerdl_snips_100d NerDLModel
English en.ner.snips classifierdl_use_snips ClassifierDLModel

New NLU Webinar

State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code

Talk Abstract

Learn to harness the power of 1,000+ production-grade & scalable NLP models for 200+ languages – all available with just 1 line of Python code by leveraging the open-source NLU library, which is powered by the widely popular Spark NLP.

John Snow Labs has delivered over 80 releases of Spark NLP to date, making it the most widely used NLP library in the enterprise and providing the AI community with state-of-the-art accuracy and scale for a variety of common NLP tasks. The most recent releases include pre-trained models for over 200 languages – including languages that do not use spaces for word segmentation algorithms like Chinese, Japanese, and Korean, and languages written from right to left like Arabic, Farsi, Urdu, and Hebrew. All software and models are free and open source under an Apache 2.0 license.

This webinar will show you how to leverage the multi-lingual capabilities of Spark NLP & NLU – including automated language detection for up to 375 languages, and the ability to perform translation, named entity recognition, stopword removal, lemmatization, and more in a variety of language families. We will create Python code in real-time and solve these problems in just 30 minutes. The notebooks will then be made freely available online.

You can watch the video here,

NLU 1.1.3 New Notebooks and tutorials

New Webinar Notebooks

  1. NLU basics, easy 1-liners (Spellchecking, sentiment, NER, POS, BERT
  2. Analyze Crypto News dataset with Keyword extraction, NER, Emotional distribution, and stemming
  3. Translate Crypto News dataset between 300 Languages with the Marian Model (German, French, Hebrew examples)
  4. Translate Crypto News dataset between 300 Languages with the Marian Model (Hindi, Russian, Chinese examples)
  5. Analyze Chinese News Headlines with Chinese Word Segmentation, Lemmatization, NER, and Keyword extraction
  6. Train a Sentiment Classifier that will understand 100+ languages on just a French Dataset with the powerful Language Agnostic Bert Embeddings
  7. Summarize text and Answer Questions with T5
  8. Solve any task in 1 line from SQUAD, GLUE and SUPER GLUE with T5
  9. Overview of models for various languages

New easy NLU 1-liners in NLU 1.1.3

Detect actions in general commands related to music, restaurant, movies.

nlu.load("en.classify.snips").predict("book a spot for nona gray  myrtle and alison at a top-rated brasserie that is distant from wilson av on nov  the 4th  2030 that serves ouzeri",output_level = "document")

outputs :

 

ner_confidence entities document Entities_Classes
[1.0, 1.0, 0.9997000098228455, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9990000128746033, 1.0, 1.0, 1.0, 0.9965000152587891, 0.9998999834060669, 0.9567000269889832, 1.0, 1.0, 1.0, 0.9980000257492065, 0.9991999864578247, 0.9988999962806702, 1.0, 1.0, 0.9998999834060669] [‘nona gray myrtle and alison’, ‘top-rated’, ‘brasserie’, ‘distant’, ‘wilson av’, ‘nov the 4th 2030’, ‘ouzeri’] book a spot for nona gray myrtle and alison at a top-rated brasserie that is distant from wilson av on nov the 4th 2030 that serves ouzeri [‘party_size_description’, ‘sort’, ‘restaurant_type’, ‘spatial_relation’, ‘poi’, ‘timeRange’, ‘cuisine’]

Named Entity Recognition (NER) Model in Bengali (bengaliner_cc_300d)

# Bengali for: 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.'
nlu.load("bn.ner.cc_300d").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")

outputs :

ner_confidence entities Entities_Classes document
[0.9987999796867371, 0.9854000210762024, 0.8604000210762024, 0.6686999797821045, 0.5289999842643738, 0.7009999752044678, 0.7684999704360962, 0.9979000091552734, 0.9976000189781189, 0.9930999875068665, 0.9994000196456909, 0.9879000186920166, 0.7407000064849854, 0.9215999841690063, 0.7657999992370605, 0.39419999718666077, 0.9124000072479248, 0.9932000041007996, 0.9919999837875366, 0.995199978351593, 0.9991999864578247] [‘সালে’, ‘ইয়াজউদ্দিন আহম্মেদ’, ‘মুন্সিগঞ্জ উচ্চ বিদ্যালয়’, ‘সালে’, ‘মুন্সিগঞ্জ হরগঙ্গা কলেজ’] [‘TIME’, ‘PER’, ‘ORG’, ‘TIME’, ‘ORG’] ১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন

Identify intent in general text – SNIPS dataset

nlu.load("en.ner.snips").predict("I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area",output_level = "document")

outputs :

document snips snips_confidence
I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area BookRestaurant 1

Word Embeddings for Bengali (bengali_cc_300d)

# Bengali for : 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.'
nlu.load("bn.embed").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")

outputs :

document bn_embed_embeddings
১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন [-0.0828 0.0683 0.0215 … 0.0679 -0.0484…]

NLU 1.1.3 Enhancements

  • Added automatic conversion to Sentence Embeddings of Word Embeddings when there is no Sentence Embedding Avaiable and a model needs the converted version to run.

NLU 1.1.3 Bug Fixes

  • Fixed a bug that caused ur.sentiment NLU pipeline to build incorrectly
  • Fixed a bug that caused sentiment.imdb.glove NLU pipeline to build incorrectly
  • Fixed a bug that caused en.sentiment.glove.imdb NLU pipeline to build incorrectly
  • Fixed a bug that caused Spark 2.3.X environments to crash.

NLU Installation

# PyPi
!pip install nlu pyspark==2.4.7
#Conda
# Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu

Additional NLU resources