Register for the 5th NLP Summit, a Free Online Conference on Sep 24-26. Register now.
was successfully added to your cart.


“A Unified CV, OCR, and NLP approach for scalable document understanding

Read the full case study


Introduction: “DocuSign has been on a mission to accelerate business and simplify life for companies and people around the world. The company pioneered the development of e-signature technology, and today DocuSign helps organizations connect and automate how they prepare, sign, act on, and manage agreements.

DocuSign team was looking to automate extraction of structured information from document images.

  • Contracts
  • Tax forms
  • Passport applications
  • Invoices
  • etc.”

Challenge: “The team faced 3 main challenges:

  • High and growing variation in layout
  • Unbounded field type complexity
  • Unstructured information

CV’s pose their own unique challenges:

  • Size variation: objects may be very small, very large or somewhere in between; can be densely packed or relatively sparse dimensionality; can have arbitrary aspect ratios
  • Context: Objects can exhibit both long and short contextual dependencies
  • Density”

Solution: “DocuSign partnered with John Snow Labs to leverage it’s award-winning Spark NLP & OCR.

Humans create documents in whatever format best suits their immediate needs. Therefore, rules-based engines (template based, position based) will not scale. The ideal solution is to learn high level representations from data using AI. This is when Spark NLP & OCR steps in.”