Home » Case study A Unified CV, OCR, and NLP approach for scalable document understanding (Docusign)

Docusign

“A Unified CV, OCR, and NLP approach for scalable document understanding“

INDUSTRY: Finance

Introduction: “DocuSign has been on a mission to accelerate business and simplify life for companies and people around the world. The company pioneered the development of e-signature technology, and today DocuSign helps organizations connect and automate how they prepare, sign, act on, and manage agreements.

DocuSign team was looking to automate extraction of structured information from document images.
Examples:

Contracts
Tax forms
Passport applications
Invoices
etc.”

Challenge: “The team faced 3 main challenges:

High and growing variation in layout
Unbounded field type complexity
Unstructured information

CV’s pose their own unique challenges:

Size variation: objects may be very small, very large or somewhere in between; can be densely packed or relatively sparse dimensionality; can have arbitrary aspect ratios
Context: Objects can exhibit both long and short contextual dependencies
Density”

Solution: “DocuSign partnered with John Snow Labs to leverage it’s award-winning Spark NLP & OCR.

Humans create documents in whatever format best suits their immediate needs. Therefore, rules-based engines (template based, position based) will not scale. The ideal solution is to learn high level representations from data using AI. This is when Spark NLP & OCR steps in.”