Classification of Unstructured Documents into the Environmental, Social & Governance (ESG) Taxonomy using Spark NLP

White Paper

There is an immense amount of unstructured data generated every day that can affect companies and their position in the market. As this information continuously grows, it’s a critical task for decision-makers to process, quantify, and analyze this data to identify opportunity and risk.

One of the important indicators in this kind of analysis is ESG (environmental, social, and governance) rating, which identifies issues for a company in these critical areas.

This White Paper does this automatically for documents continuously ingested from over world news. The models have been deployed in production as part of a big data analytics platform of a leading data provider to the financial services industry.

To perform this analysis effectively and process a massive number of data sources, John Snow Labs’ Spark NLP has been used to automatically analyze incoming documents and detect material ESG events. The goal of this machine learning pipeline is to automatically identify ESG material events in unstructured data records and tag them correctly.

Get your White Paper