was successfully added to your cart.

Accurate Table Extraction from Documents & Images with Spark OCR

Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in appearance, fonts, borders, and layouts. This makes the data extraction task challenging even when the text is searchable – but more so when the table is only available as an image. This webinar presents how Spark OCR automatically extracts tabular data from images. This end-to-end solution includes computer vision models for table detection and table structure recognition, as well as OCR models for extracting text & numbers from each cell. The implemented approach provides state-of-the-art accuracy for the ICDAR 2013 and TableBank benchmark datasets.

New Spark OCR 3.12: Handwritten Text Recognition and Spark 3.2 support

This release comes with new models for Handwritten Text Recognition, Spark 3.2 support, bug fixes, and notebook examples. Added to the ImageTextDetectorV2...