Legal NLP 1.6.0: Applicable Law, Dispute clauses, new embeddings and much more!

02.02.2023

Juan Martinez

We are happy to announce the Legal NLP 1.6 is out.

Legal NLP is a John Snow Lab’s product, launched 2022 to provide state-of-the-art, autoscalable, domain-specific NLP on top of Spark.

With more than 600 models, featuring Deep Learning and Transformer-based architectures, NLP for legal includes:

Annotators to carry out Name Entity Recognition, Relation Extraction, Assertion Status / Understanding Entities in Context, Data Mapping to external sources, Deidentification, Question Answering, Table Question Answering, Sentiment Analysis, Summarization and much more, both training and inference!
Zero-shot Name Entity Recognition and Relation extraction;
600+ pretrained Deep Learning / Transformer-based models;
Fully integration with Databricks, AWS or Azure;
33+ notebooks and 25+ demos ready to showcase its features.
Full integration with NLP Lab (former Annotation Lab) for managing your annotation projects and train your legal NLP model in a zero-code fashion.
Compatiblity with Visual NLP, to combine OCR/Visual capabilities, as Signature Extraction, Form Recognition or Table detection, to Legal NLP.

Portuguese, Italian, French, English Legal Language Models

This is a list of new LM you can use to calculate the embeddings of your texts and train Legal nlp models on the top of them (including English, Portuguese, Italian and French):

bert_embeddings_Legal_BERTimbau_base_pt bert_embeddings_legal_bert_base_cased_ptbr_pt bert_embeddings_bert_large_portuguese_cased_legal_mlm_pt
bert_embeddings_Legal_BERTimbau_large_pt
bert_embeddings_Italian_Legal_BERT_it
bert_embeddings_legal_bert_small_uncased_en
bert_embeddings_custom_legalbert_en
bert_embeddings_legalbert_en
bert_embeddings_legalbert_large_1.7M_2_en
bert_embeddings_Legal_heBERT_he
bert_embeddings_legalbert_large_1.7M_1_en
bert_embeddings_legal_bert_base_uncased_finetuned_ledgarscotus7_en
bert_embeddings_Legal_heBERT_ft_he
bert_embeddings_bert_small_finetuned_legal_contracts_larger4010_en
bert_embeddings_bert_tiny_finetuned_legal_definitions_en
bert_embeddings_bert_small_finetuned_legal_definitions_en
bert_embeddings_legal_bert_base_uncased_finetuned_RRamicus_en
bert_embeddings_bert_small_finetuned_legal_definitions_longer_en
bert_embeddings_bert_small_finetuned_legal_contracts10train10val_en
bert_embeddings_bert_small_finetuned_legal_contracts_larger20_5_1_en
camembert_embeddings_Italian_Legal_BERT_SC_it
camembert_embeddings_legal_camembert_fr
camembert_embeddings_legal_distilcamembert_fr
camembert_embeddings_lsg16k_Italian_Legal_BERT_SC_it

Applicable Law NER and Classification

Classify paragraphs talking about Applicable Law and extract the laws from them using legclf_applicable_law_cuad_en and legner_applicable_law_clause.

APPLIC_LAW NER

Dispute Clauses Law NER and Classification

Classify paragraphs talking about Dispute Clauses using legclf_dispute_clauses_cuad and extract entities from them using legner_dispute_clauses.

COURT_NAME and RULES_NAME

11 New Clause Classifiers

Identify clauses in your legal agreements with this new clause binary classifiers. They can retrieve for you the following categories given an input paragraph:

– legclf_tax_matters_md: [tax_matter, other]
– legclf_choice_of_law_md:[choice_of_law, other]
– legclf_term_of_agreement_md:[term_of_agreement, other]
– legclf_attorney_fees_md:[attorney_fees, other]
– legclf_effect_of_termination_md:[effect_of_termination, other]
– legclf_affirmative_covenants_md:[affirmative_covenants, other]
– legclf_amendments_and_waivers_md:[amendments_and_waivers, other]
– legclf_miscellaneous_provisions_md: [miscellaneous_provisions, other]
– legclf_conditions_precedent_md: [conditions_precedent, other]
– legclf_fees_and_expenses_md:[fees_and_expenses, other]
– legclf_termination_for_cause_md:[termination_for_cause, other]

Contextual Parser (scalable rule-base NER)

Detect HEADERs and SUBHEADERs in legal agreements, for splitting purposes. Included in legpipe_header_subheader all along with other DL-based models for a hybrid extraction approach.

Split legal agreements into sections using pretrained pipelines

Combine NER, Contextual Parser and ChunkSplitting models to split legal agreements into different sections automatically, using legpipe_header_subheader

Augmented legner_contract_doc_parties

We have improved our Legal NER model to detect Document Types, Parties, Aliases, Former Names, Organizations and Effective Dates in legal agreements. The new model is lg(large) and can be found by the name of legner_contract_doc_parties_lg

New training notebooks and certification training

We have continued creating notebooks showcasing Legal NLP functionalities, with more than 33 notebooks available. Some of the new ones include:

Legal BertForTokenClassification training and inference, using any available Legal Bert model available in Hugging Face.
ContextualParser for carrying out rule-based NER at scale, both training and inference.
Binary, Multiclass and Multilabel classification of legal texts, training and inference.
RelationExtraction based on spanBert training.
EntityResolution (normalization) and Chunk Mapping (data augmentation training notebooks.
Using NER to split a text with ChunkSentenceSplitting.
Legal Use Case notebook: Creating graphs from Credit Agreements.

How to run

Legal NLP is very easy to run on both clusters and driver-only environments using johnsnowlabs library:

!pip install johnsnowlabs

nlp.install(force_browser=True)
nlp.start()

Fancy trying?

We’ve got 30-days free licenses for you with technical support from our legal team of technical and SME. Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Try Legal NLP

See in action

Juan Martinez

Our additional expert:

Juan Martinez is a Sr. Data Scientist, working at John Snow Labs since 2021. He graduated from Computer Engineering in 2006, and from that time on, his main focus of activity has been the application of Artificial Intelligence to texts and unstructured data. To better understand the intersection between Language and AI, he complemented his technical background with a Linguistics degree from Moscow Pushkin State Language Institute in 2012 and later on on University of Alcala (2014). He is part of the Healthcare Data Science team at John Snow Labs. His main activities are training and evaluation of Deep Learning, Semantic and Symbolic models within the Healthcare domain, benchmarking, research and team coordination tasks. His other areas of interest are Machine Learning operations and Infrastructure.