By all accounts, John Snow Labs has created the most accurate software in history to extract facts from unstructured text.
Proven Customer Success
What’s in the box
Visual NLP in Action
Combine computer vision, OCR, and NLP models to classify documents, extract normalized entities and figures, find signatures on forms, extract data from tables, and de-identify images.
Extract and normalize specific facts & figures from custom images and forms, by training your own models to learn where in the image, next to which words, and using what formatting the facts you’re interested in are.
Find tables in images, visually identify rows and columns, and extract data from cells into data frames. Turn scans from financial disclosures, academic papers, lab results and more into usable data.
End-to-end example of regular NER pipeline: import scanned images from cloud storage, preprocess them for improving their quality, recognize text using Spark OCR tool, correct the spelling mistakes for improving OCR results and finally run NER for extracting entities.
Correct the skewness of your scanned documents will highly improve the results of the OCR. Spark OCR is the only library that allows you to finetune the image preprocessing for excellent OCR results.
Removing the background noise in a scanned document will highly improve the results of the OCR. Visual NLP is the only OCR tool that allows you to finetune the image preprocessing for excellent OCR results.
Detect signatures in image-based documents.