Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in appearance, fonts, borders, and layouts. This makes the data extraction task challenging even when the text is searchable – but more so when the table is only available as an image.
This webinar presents how Spark OCR automatically extracts tabular data from images. This end-to-end solution includes computer vision models for table detection and table structure recognition, as well as OCR models for extracting text & numbers from each cell. The implemented approach provides state-of-the-art accuracy for the ICDAR 2013 and TableBank benchmark datasets.