Introduction: “DocuSign has been on a mission to accelerate business and simplify life for companies and people around the world. The company pioneered the development of e-signature technology, and today DocuSign helps organizations connect and automate how they prepare, sign, act on, and manage agreements.
DocuSign team was looking to automate extraction of structured information from document images.
- Tax forms
- Passport applications
Challenge: “The team faced 3 main challenges:
- High and growing variation in layout
- Unbounded field type complexity
- Unstructured information
CV’s pose their own unique challenges:
- Size variation: objects may be very small, very large or somewhere in between; can be densely packed or relatively sparse dimensionality; can have arbitrary aspect ratios
- Context: Objects can exhibit both long and short contextual dependencies
Solution: “DocuSign partnered with John Snow Labs to leverage it’s award-winning Spark NLP & OCR.
Humans create documents in whatever format best suits their immediate needs. Therefore, rules-based engines (template based, position based) will not scale. The ideal solution is to learn high level representations from data using AI. This is when Spark NLP & OCR steps in.”