John Snow Labs Finance NLP 1.11 comes with a lot of new capabilities added to the 135+ models and 35+ Language Models already available in previous versions of the library. Let’s take a look at each of them!
Native Financial Text Summarization: new annotator and 2 models
Financial texts, including reports, filings, etc., may be very long, verbose and complicated.
By using our new Financial Summarizer() module, you can get state-of-the-art, short versions of your financial documents, without losing any information.
We included 2 models for Financial Summarization:
- The base model, with generic capacities for summarizing financial documents:
- A specifically finetuned model trained to summarize Financial Reports sections. For this task, we finetuned our base model with more than 8K sections from different SEC Financial Reports.
New Financial Word and Sentence Embeddings
Language Models provide a numeric representation of texts which allow to train better models. Word Embeddings are used to train token(word)-based classifiers, as Name Entity Recognition. Sentence Embeddings are used to train Text Classifiers (document, section, sentence level) or calculate Sentence (text) Similarity.
It’s crucial the Language Models are train with domain-specific texts, so that they understand the nuances of financial documents. We are happy to announce the inclusion of the following models:
Word Embeddings: English Financial Roberta and Bert, German, Japanese and Chinese Word Embeddings models.
Sentence Embeddings: Chinese Financial Bert, Chinese Financial DistilBert, English Financial Embeddings (from SetFit).
Example of embeddings used to calculate Sentence Similarity:
We generate revenue primarily from subscription fees
New mixed Financial and Legal NER model on SEC documents
Identify up to 9 entities in SEC documents, including Business-Companies (ORG), People (PER), Legistations / Acts / Regulations (LAW), Locations (LOC), Government Institutions (INST), Courts (COURT) and other proper nouns (MISC), aliases of concepts or references (ALIAS) and Tickers (TICKER).
In our opinion, the accompanying consolidated balance sheets and the related consolidated statements of operations, of changes in stockholders’ equity, and of cash flows present fairly, in all material respects, the financial position of SunGard Capital Corp. II and its subsidiaries ( SCC II ) at December 31, 2010, and 2009, and the results of their operations and their cash flows for each of the three years in the period ended December 31, 2010, in conformity with accounting principles generally accepted in the United States of America.
New Visual NLP + Finance NLP demos
Financial documents usually contain tables, forms, balance sheets, which should be analyzed keeping the visual information at image level, since converting to text makes you lose the format the information is displayed.
By Using Visual NLP and NLP for financial services, you can carry out Table Question Answering (extracting first the table with Visual NLP and then carrying out Table Understanding with Finance NLP) and Visual Question Answering.
Check out some of the new demos for the combination of our 2 John Snow Labs libraries.
Visual Question Answering on Balance Sheets
➤Q1: What type of report is this report?︎ 1,189︎
➤Q2: How many Ordinary Stocks are in circulation as of February 16, 2010? 168,620
➤Q3: How many billion dollars is the total market value of the voting shares held by the Registrant’s non-affiliates as of June 28, 2009? 684
➤Q4: What is the title of each class? 338,923
Table Question Answering
The specificity of Table Question Answering (Table Understanding) is that you can get mathematical operations also encoded in the response, as COUNT, AVERAGE, MEAN, SUM, etc.
➤Q1: What is the sum of Employees (%) for Firm 1, 2, 3, 4, 5 and 6?
︎ SUM(40.0000, 30.0000, 50.0000, 40.0000, 20.0000, 40.0000)
➤Q2: What is the average of Environment (%) for Firm 1, 2, 3, 4, 5 and 6?
AVERAGE(57.69, 38.46, 34.62, 23.08, 19.23, 3.85)
➤Q3: How many Firms, among Firm 1,2,3,4,5 and 6, have a Risk (%) value of more than 30.00?
COUNT(2, 3, 4)
We’ve got 30-days free licenses for you with technical support from our financial team of technical and SME. This trial includes complete access to more than 135 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc. and 35+ financial language models.
Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!
Don’t foger to check our notebooks and demos.
How to run
Finance NLP is very easy to run on both clusters and driver-only environments using
!pip install johnsnowlabs