In the first two posts of this series, we benchmarked OCR on two increasingly demanding tasks: Grounded (BBox) OCR, reading text AND returning its coordinates Image → Markdown OCR, plain-text...
Most OCR tools tell you what a document says. That’s fine for search indexing and RAG. But when your workflow needs to act on a specific piece of text (redact...
In our first benchmark, we showed that JSL Vision OCR is the #1 grounded OCR model overall, beating every closed-source frontier system on the FUNSD dataset. This post answers a different question: plain-text...
If you’ve shopped for an OCR model recently, you already know the problem: every vendor claims state-of-the-art accuracy, every benchmark uses a different dataset, and “VLMs can do OCR” is...
We benchmarked OpenAI Privacy Filter against a John Snow Labs de-identification pipeline on 381,959 tokens of real clinical text. The John Snow Labs pipeline reached 0.95 F1 on PHI detection...