NLP in Finance: Examining the Impact of Natural Language Processing in Financial and Banking Services

07.03.2023

Anber Arif

Data Scientist at John Snow Labs

How Natural Language Processing Can Improve Financial Services

According to Forbes, unstructured data is growing at 55-65% each year and almost 90% of it has been generated in the recent two years. More data demands the need for more brains to process it. Today, companies use Artificial intelligence (AI) approaches to spend less time on data discovery and more time on deriving insights from the data.

One of the approaches is Natural Language Processing (NLP) which helps companies make sense of unstructured data. NLP is a subfield of AI that helps computers understand human language. We use NLP every day when our phone autocorrects the spelling or recommends the next word for our message. The typical applications of NLP in Finance are:

Classifying Financial Documents
Recognizing Financial Entities
Understanding Entities in Context
Extracting Financial Relationships
Normalization and Data Augmentation
Financial Deidentification
Financial Document Splitting

By 2025, almost 30% of the applications of Natural Language Processing will be carried out inside Banking, Financial Services, and Insurance. Banking has historically been the natural promoter of the application of AI, more specifically, NLP for finance, in data automation.

NLP in finance market (BFSI sector) is already used for:

Automatic loan, credit applications
Automatic calculations of fees
Customer onboarding
Risk Management
Asset Management, ESG
Compliance
Content Enrichment, etc.

The Role of Natural Language Processing in Financial Services

Finance and banking industry uses NLP for a variety of purposes like improved decision making, automation, data enrichment, etc. NLP in finance automates the manual processes of turning unstructured data into a more usable form. For example, information extraction on financial annual reports, Sentiment Analysis on financial news, ESG and asset management, Sentiment Analysis on tweets about companies, the capture of earning calls, and acquisition announcements.

It adds context to the unstructured data and makes it more searchable and actionable. It even automates tedious/boring tasks reducing human interaction.

Following are the three main NLP approaches.

Rule-based Approach – This approach classifies text using a set of handicraft linguistic rules that define a list of words, patterns or regular expressions characterized by groups.
Machine Learning Approach Including Deep Learning – A technique that teaches computers to do what comes naturally to humans: learn by example. We provide the algorithms with large annotated datasets so that it learns to do predictions on unseen data.
Hybrid Approach – It gets the best of both worlds: Rule-based, to leverage SME and known domain rules, and Deep Learning to learn by example.

In John Snow Labs, although our major approach is Deep Learning, with Transformer Architectures and Language Models, we often follow Hybrid approaches.

NLP Applications in Finance and Banking Sector

Below are the applications of Natural Language Processing in the finance industry.

Classification of Financial Documents

In today’s fast and complex ecosystem, it is difficult to manage financial information. It is because privacy is important as the data is highly confidential and sensitive. We can use various NLP techniques to classify financial documents.

For instance, the finance industry uses text classification to predict various financial outcomes. it can automatically classify different types of agreements (loan, service, consulting agreements, etc).

Other use cases of NLP in financial documents classifying are:

ESG (Environmental, Social, and Governance) news classification

Forward looking statements classification – NLP in finance detects Forward Looking Statements in financial texts, as 10K filings or annual reports.

Identifying topics about banking – NLP helps classify banking-related texts into various categories as shown below.

Classifying customer support tickets (banking) – NLP models can classify the topic/class of a complaint about a bank-related product.

Receipt binary classification – We can use Financial Image Transformers (ViT) in Spark NLP to detect receipts in both scanned and mobile images.

Classifying different SEC (Securities and Exchange Commission) filings – We can use pre trained NLP models for banking and finance to classify SEC filings such as 10-K,10-Q, 8-K, etc.

We can also use Sentiment Analysis to analyze large volumes of textual data and understand various entities in it. Sentiment Analysis is an NLP technique that companies use for various things like analyzing reports and customer feedback, gauging market sentiment, etc.

For instance, we can identify sentiment (neutral, positive or negative) in financial news as shown below.

Recognizing Financial Entities

NLP helps us identify and classify named entities in text, such as people, locations, dates, numbers, etc. to make recommendations or predictions. Named Entity Recognition (NER) is an NLP approach that finds and extracts entities from unstructured textual documents. For instance, it can recommend solutions based on news articles about a particular organization.

We can also use it to extract investment signals from news headlines. Banks and NBFCs (Non-Banking Financial Companies) use NER to extract key information from customer data.

Below are the use cases of Named Entity Recognition in Finance.

Extracting financial entities from annual reports, as Expenses, Loses, Profit declines or increases, etc.

Extracting ORG (Organization names) and PRODUCT (Product names).
Extracting information like Company Name, Trading symbols, Stock markets, Addresses, Phones, Stock types and values, IRS, CFN, etc. from the first page of 10-K filings.

Identifying ORG (Companies), their ALIAS (other names the company uses in financial reports) and company PRODUCTS.

We can also perform Financial Zero Shot Named Entity Recognition. For instance, we can use prompts in the form of questions, to carry our Named Entity Recognition without any pre-trained dataset.

Understanding Entities in Context

Understanding Entities in Context is the ability of asserting if an entity is mentioned to happen in the present, past, future, if it’s negated, present, absent, if it’s hypothetical, probable, etc.

For instance, we can use Assertion Status to identify:

If a PRODUCT or an ORG is mentioned to be a competitor.

If a mention to an Organization, Job Title or Date is about the past.
If financial information is described to happen in present, past, future or it’s just possible.

Extract Financial Relationships

Relation Extraction is the ability to infer if two entities are connected. It helps us extract relations between a company and its profit, losses, cash flow operations, etc. Also, it allows us to do the following:

Extract relations between ORG (Companies), PRODUCT (Products) and their ALIAS in financial documents.

Extract Acquisition and Subsidiary relations from ORG (Companies), ALIAS (Aliases of companies in an agreement) and PRODUCT (Products).

Extract Relationships About People’s Job Experiences. For instance, the figure below shows how we can group together entities as PERSON, DATE, ORG (Organizations) and ROLE (job titles) to understand present and past job experiences of employees.

Financial Relation Extraction on 10K filings – The figure below shows that the model extracts relations between amounts, counts, percentages, dates and the financial entities extracted with `finner_financial` models.

Financial Zero-shot Relation Extraction – The figure below shows we can carry out Relation Extraction without training any model, just with some textual examples.

Normalization & Data Augmentation

The text data is preprocessed to a suitable form before it is used in training NLP models. Normalization reduces variations in word forms and improves the model’s performance. When we normalize text, we reduce its randomness and bring it closer to a predefined standard. We also reduce the variability of, for example, Company names, to disambiguate and be able to match/ link with other databases, such as SEC Edgar.

Data Augmentation in our libraries is the ability to use extracted information, such as Company Names, to query data sources and obtain more information, like Company’s SIC code, Trading Symbol, Address, Financial Period, etc.

The figure below shows how the NLP model augments NER with information from external sources.

Financial Deidentification

De-identification is a general term for any process of removing the association between a set of identifying data and the data subject. It consists of algorithms and processes that can be applied to documents, records, and data to remove any information, which can lead to the identification of the person the document is concerned with. It protects the privacy of the individuals when addressed by people who should not know the person’s identity.

The figure shows the deidentification/masking of financial data to be compliant with data privacy regulations as GDPR and CCPA.

Financial Document Splitting

In NLP, splitting is the process of dividing text into smaller pieces, like sections, paragraphs, sentences, phrases. For instance, we can use NER to detect headers and subheaders in Financial Documents.

Text Summarization

This approach generates a concise summary of a text document. We can use it to extract insights and useful relationships between entities from financial reports and news articles.

In Finance, Text Summarization helps us extract headers from financial news and summarize financial news.

Financial Q&A

We can do question-answering over financial data using NLP techniques.

Financial NLP & Visual NLP

Most financial documents are multimodal, including unstructured text, tables, forms, and combinations of unstructured and structured information together (text inside tables).

Multimodal scenarios can be solved by using:

Visual NLP for Table and Form Extraction, and Financial Document Understanding

Extracting tables from selectable PDF documents with Spark OCR

Detecting and extracting structured tables from scanned PDF documents & images with Spark OCR

Extracting tables from selectable PDF documents with Spark OCR

Classifying finance documents using text and layout data with the new features offered by Spark OCR

Detecting companies, total amounts and dates in scanned invoices using out of the box Spark OCR models

Perceiving data in forms as key-value pairs

Spark NLP for Finance for Financial Q&A on tables

Limitations of NLP in Financial Services

Natural Language Processing In Finance, automates processes, reduces errors, provides customer support 24/7, and boosts revenue. But here are some challenges that the finance industry faces when using NLP.

Data Privacy

One of the biggest challenges faced by the banking and financial institutions is data privacy. Banks hold sensitive customer data that must be protected. When they use AI and NLP, they have to share data with third-party providers to train the machine learning algorithms. Here, the security concerns raise like:

Who will have access to the data?
How will the data be used?

Further, there are regulatory concerns when using NLP in the banking and finance sector. For instance, there can be fears about biased decision-making when a bank uses NLP to make lending decisions. Therefore, transparency needs to be ensured around how NLP systems make decisions.

Data Quality

NLP systems need voluminous amounts of data to work effectively. But the banks may not have sufficient data on certain products or customers. Further, machine learning models need clean and well-structured data as input and the data available to banks may not be of high quality. Here comes the need of data cleaning processes that are expensive and time-consuming.

No Justification for Rejections

NLP-based systems for financial services can significantly impact a person’s life. For example, there can be a huge effect on a customer’s future if the system does not approve his/her loan request.

If the system is not able to discern the bias and only analyzes information based on its design, how can financial institutions explain rejection to clients? Without proper justification, it is difficult for them to explain their decision.

High Cost of Investment

Minor and small-scale organizations can not afford advanced NLP-based systems as they are quite expensive. Apart from the software and additional hardware costs, regular updates need to be scheduled and implemented. Systems can be unavailable for an extended period of time if there’s a problem with the update.

The Future of NLP in Finance

The top key benefits of utilizing NLP in Finance are:

Accuracy
Consistency
Scaling
Efficiency
Process Automation

Natural Language Processing has transformed a number of industries like Healthcare, Education, Business, Data Science, Banking and Finance. Banks use NLP-powered chatbots to enhance communication with customers and better answer their queries. Financial institutions use NLP to manage risks, and automate routine tasks.

In the future, NLP will help the banks identify new revenue streams, make lending decisions, and provide personalized financial advice. We see as NLP evolves, it will have a profound impact on the financial industry.

The young consumers prefer digital banking channels. According to a report, 70% of US respondents support digital banking as it has become the primary way to access accounts. It indicates NLP implementation is critical for financial institutions to be successful and competitive in the coming future.

But there are certain risks associated with using Natural Language Procassinf in the financial sector. NLP algorithms for decision-making are hard for humans to comprehend. Further, there is a risk to human employment as NLP can replace human workers in various roles.

In the future, NLP-powered systems can have access to sensitive data such as health information and financial records that can be used to violate our privacy rights.

State of the Art NLP for Finance

John Snow Labs built Finance NLP – a dedicated library that contains a series of new pretrained models and state-of-the-art algorithms, able to carry out Entity Recognition, Relation Extraction, Assertion Status Detection, Entity Resolution, De-identification, Text Classification, and more. Spark NLP is used by 50% of practitioners in the Finance industry, signaling a demand for a dedicated offering.

John Snow Labs commands a 59% market share in Healthcare & Life Science, with customers including half of the world’s top 10 Pharmaceutical companies and the three largest US Healthcare companies, among others. Many of the same challenges within Healthcare—highly domain-specific language, stringent privacy and compliance regulations, and a mix of structured and unstructured data — apply to the Financial industry.

With more than 60 finance models, John Snow Labs is powering new and innovative NLP applications.

Conclusion

The highly specific jargon and nuanced semantics in financial documents, paired with the sheer amounts of text these industries generate present a massive opportunity for natural language processing to help automate, simplify, and optimize operations. NLP in Finance enables that by providing current state-of-the-art accuracy, a broad set of out-of-the-box models for common use cases, and ease of use building them into production systems.

Finance NLP is supported on all major data platforms including public cloud providers, Databricks, Kubernetes, on-premise, or on single machines. One-click installation with a 30-day free trial is available through AWS Marketplace and Azure Marketplace.

Try Finance NLP

See in action

Anber Arif

Data Scientist at John Snow Labs

Our additional expert:

Anber Arif is a Data Science and an AI enthusiast who has always been fascinated by the power of data to drive insights and make informed decisions. She started her journey in Data Science during her undergraduate studies in Software Engineering, where she took courses in Statistics and Machine Learning. She deepened her knowledge in the field and worked on various projects in Artificial Intelligence, Natural Language Processing, and Computer Vision. Apart from that, Anber is an experienced technical writer and has a proven track record of creating clear, concise, and accurate technical content for a variety of audiences and industries. She has worked on user manuals, technical guides, API documentation, and more. She is working as an AI copywriter at John Snow Labs. She has excellent communication skills and is able to work closely with software developers, product managers, and other stakeholders to ensure that the documentation is accurate, comprehensive, and meets the needs of the users.