Language nlu.load() reference Spark NLP Model reference Type
Arabic ar.ner arabic_w2v_cc_300d Named Entity Recognizer
Arabic ar.embed.aner aner_cc_300d Word Embedding
Arabic ar.embed.aner.300d aner_cc_300d Word Embedding (Alias)
Bengali bn.stopwords stopwords_bn Stopwords Cleaner
Bengali bn.pos pos_msri Part of Speech
Thai th.segment_words wordseg_best Word Segmenter
Thai th.pos pos_lst20 Part of Speech
Thai th.sentiment sentiment_jager_use Sentiment Classifier
Thai th.classify.sentiment sentiment_jager_use Sentiment Classifier (Alias)
Chinese zh.pos.ud_gsd_trad pos_ud_gsd_trad Part of Speech
Chinese zh.segment_words.gsd wordseg_gsd_ud_trad Word Segmenter
Bihari bh.pos pos_ud_bhtb Part of Speech
Amharic am.pos pos_ud_att Part of Speech

NLU 1.1.1 New English Models and Pipelines

Language nlu.load() reference Spark NLP Model reference Type
English en.sentiment.glove analyze_sentimentdl_glove_imdb Sentiment Classifier
English en.sentiment.glove.imdb analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.sentiment.glove.imdb analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.sentiment.glove analyze_sentimentdl_glove_imdb Sentiment Classifier (Alias)
English en.classify.trec50.pipe classifierdl_use_trec50_pipeline Language Classifier
English en.ner.onto.large onto_recognize_entities_electra_large Named Entity Recognizer
English en.classify.questions.atis classifierdl_use_atis Intent Classifier
English en.classify.questions.airline classifierdl_use_atis Intent Classifier (Alias)
English en.classify.intent.atis classifierdl_use_atis Intent Classifier (Alias)
English en.classify.intent.airline classifierdl_use_atis Intent Classifier (Alias)
English en.ner.atis nerdl_atis_840b_300d Aspect based NER
English en.ner.airline nerdl_atis_840b_300d Aspect based NER (Alias)
English en.ner.aspect.airline nerdl_atis_840b_300d Aspect based NER (Alias)
English en.ner.aspect.atis nerdl_atis_840b_300d Aspect based NER (Alias)

New Easy NLU 1-liner Examples:

Extract aspects and entities from airline questions (ATIS dataset)

      nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
      output:  ["baltimore"," dallas", "round trip"]

Intent Classification for Airline Traffic Information System queries (ATIS dataset)

      nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
      output:  "atis_airfare"

Recognize Entities OntoNotes – ELECTRA Large

      nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")	
      output:  ["Johnson", "first", "2001", "eight years", "London"]

Question classification of open-domain and fact-based questions Pipeline – TREC50

      nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
      output: LOC_other

Traditional Chinese Word Segmentation

      # 'However, this treatment also creates some problems' in Chinese
      nlu.load("zh.segment_words.gsd").predict("然而,這樣的處理也衍生了一些問題。")
      output:  ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]

Part of Speech for Traditional Chinese

      # 'However, this treatment also creates some problems' in Chinese
      nlu.load("zh.pos.ud_gsd_trad").predict("然而,這樣的處理也衍生了一些問題。")

Output:

Token POS
然而 ADV
PUNCT
這樣 PRON
PART
處理 NOUN
ADV
衍生 VERB
PART
一些 ADJ
問題 NOUN
PUNCT

Thai Word Segment Recognition

      # 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
      nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")

Output:

token
M
o
n
a
Lisa
เป็น
ภาพ
สีน้ำ
มัน
ใน
ศตวรรษ
ที่
16
ที่
สร้าง
L
e
o
n
a
r
d
o
จัด
ขึ้น
ที่
พิพิธภัณฑ์
ลูฟร์
ใน
ปารีส

Part of Speech for Bengali (POS)

      # 'The village is also called 'Mod' in Tora language' in Bengali 
      nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")

Output:

token pos
বাসস্থান-ঘরগৃহস্থালি NN
তোড়া NNP
ভাষায় NN
গ্রামকেও NN
বলে VM
` SYM
মোদ NN
SYM
SYM

Stop Words Cleaner for Bengali

      # 'This language is not enough' in Bengali 
      df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")

Output:

cleanTokens token
ভাষা এই
যথেষ্ট ভাষা
নয় যথেষ্ট
None নয়

Part of Speech for Bengali

      # 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
      nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई")

Output:

pos token
DET ओहु
NOUN लोग
ADP के
NOUN मालूम
VERB बा
SCONJ कि
ADJ श्लील
VERB होखते
PROPN भोजपुरी
ADP के
NOUN नींव
VERB हिल
AUX जाई

Amharic Part of Speech (POS)

      # ' "Son, finish the job," he said.' in Amharic
      nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ"')

Output:

pos token
NOUN ልጅ
DET
PART
NOUN ሥራ
DET
PART
VERB አስጨርስ
PRON ኧው
AUX ኣል
PRON ኧሁ
PUNCT
NOUN

Thai Sentiment Classification

      #  'I love peanut butter and jelly!' in thai
      nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']]

Output:

sentiment sentiment_confidence
positive 0.999998

Arabic Named Entity Recognition (NER)

      # 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
      nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']]

Output:

entity_class ner_confidence entities
ORG [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] قوات الثورة العربية
LOC [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] دمشق
PER [1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] الإنكليز