
We are very excited to release NLU 1.1.1!
This release features 3 new tutorial notebooks for Open/Closed book question answering with Google’s T5, Intent classification, and Aspect Based NER.
In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release. Finally, NLU now supports running on Spark 2.3 clusters.
NLU 1.1.0 New Non-English Models
| Language | nlu.load() reference | Spark NLP Model reference | Type |
|---|---|---|---|
| Arabic | ar.ner | arabic_w2v_cc_300d | Named Entity Recognizer |
| Arabic | ar.embed.aner | aner_cc_300d | Word Embedding |
| Arabic | ar.embed.aner.300d | aner_cc_300d | Word Embedding (Alias) |
| Bengali | bn.stopwords | stopwords_bn | Stopwords Cleaner |
| Bengali | bn.pos | pos_msri | Part of Speech |
| Thai | th.segment_words | wordseg_best | Word Segmenter |
| Thai | th.pos | pos_lst20 | Part of Speech |
| Thai | th.sentiment | sentiment_jager_use | Sentiment Classifier |
| Thai | th.classify.sentiment | sentiment_jager_use | Sentiment Classifier (Alias) |
| Chinese | zh.pos.ud_gsd_trad | pos_ud_gsd_trad | Part of Speech |
| Chinese | zh.segment_words.gsd | wordseg_gsd_ud_trad | Word Segmenter |
| Bihari | bh.pos | pos_ud_bhtb | Part of Speech |
| Amharic | am.pos | pos_ud_att | Part of Speech |
NLU 1.1.1 New English Models and Pipelines
New Easy NLU 1-liner Examples:
Extract aspects and entities from airline questions (ATIS dataset)
nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
output: ["baltimore"," dallas", "round trip"]
Intent Classification for Airline Traffic Information System queries (ATIS dataset)
nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
output: "atis_airfare"
Recognize Entities OntoNotes – ELECTRA Large
nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")
output: ["Johnson", "first", "2001", "eight years", "London"]
Question classification of open-domain and fact-based questions Pipeline – TREC50
nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
output: LOC_other
Traditional Chinese Word Segmentation
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.segment_words.gsd").predict("然而,這樣的處理也衍生了一些問題。")
output: ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]
Part of Speech for Traditional Chinese
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.pos.ud_gsd_trad").predict("然而,這樣的處理也衍生了一些問題。")
Output:
| Token | POS |
|---|---|
| 然而 | ADV |
| , | PUNCT |
| 這樣 | PRON |
| 的 | PART |
| 處理 | NOUN |
| 也 | ADV |
| 衍生 | VERB |
| 了 | PART |
| 一些 | ADJ |
| 問題 | NOUN |
| 。 | PUNCT |
Thai Word Segment Recognition
# 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")
Output:
| token |
|---|
| M |
| o |
| n |
| a |
| Lisa |
| เป็น |
| ภาพ |
| ว |
| า |
| ด |
| สีน้ำ |
| มัน |
| ใน |
| ศตวรรษ |
| ที่ |
| 16 |
| ที่ |
| สร้าง |
| โ |
| ด |
| ย |
| L |
| e |
| o |
| n |
| a |
| r |
| d |
| o |
| จัด |
| ขึ้น |
| ที่ |
| พิพิธภัณฑ์ |
| ลูฟร์ |
| ใน |
| ปารีส |
Part of Speech for Bengali (POS)
# 'The village is also called 'Mod' in Tora language' in Bengali
nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")
Output:
| token | pos |
|---|---|
| বাসস্থান-ঘরগৃহস্থালি | NN |
| তোড়া | NNP |
| ভাষায় | NN |
| গ্রামকেও | NN |
| বলে | VM |
| ` | SYM |
| মোদ | NN |
| ‘ | SYM |
| ৷ | SYM |
Stop Words Cleaner for Bengali
# 'This language is not enough' in Bengali
df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")
Output:
| cleanTokens | token |
|---|---|
| ভাষা | এই |
| যথেষ্ট | ভাষা |
| নয় | যথেষ্ট |
| None | নয় |
Part of Speech for Bengali
# 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई")
Output:
| pos | token |
|---|---|
| DET | ओहु |
| NOUN | लोग |
| ADP | के |
| NOUN | मालूम |
| VERB | बा |
| SCONJ | कि |
| ADJ | श्लील |
| VERB | होखते |
| PROPN | भोजपुरी |
| ADP | के |
| NOUN | नींव |
| VERB | हिल |
| AUX | जाई |
Amharic Part of Speech (POS)
# ' "Son, finish the job," he said.' in Amharic
nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ"')
Output:
| pos | token |
|---|---|
| NOUN | ልጅ |
| DET | ኡ |
| PART | ን |
| NOUN | ሥራ |
| DET | ው |
| PART | ን |
| VERB | አስጨርስ |
| PRON | ኧው |
| AUX | ኣል |
| PRON | ኧሁ |
| PUNCT | ። |
| NOUN | “ |
Thai Sentiment Classification
# 'I love peanut butter and jelly!' in thai
nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']]
Output:
| sentiment | sentiment_confidence |
|---|---|
| positive | 0.999998 |
Arabic Named Entity Recognition (NER)
# 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']]
Output:
| entity_class | ner_confidence | entities |
|---|---|---|
| ORG | [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] | قوات الثورة العربية |
| LOC | [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] | دمشق |
| PER | [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] | الإنكليز |
NLU 1.1.0 Enhancements
-
Spark 2.3 compatibility
New NLU Notebooks and Tutorials
Intent Classification for Airline emssages ATIS
Installation
# PyPi
!pip install nlu pyspark==2.4.7
#Conda
# Install NLU from Anaconda/Conda
conda install -c johnsnowlabs nlu



























