OCR Evaluation Tools for the 21st Century

Eddie A Santos

doi:10.33011/computel.v1i.345

OCR Evaluation Tools for the 21st Century

Eddie A Santos

Open Access

https://doi.org/10.33011/computel.v1i.345

Copy DOI

Journal: Proceedings of the Workshop on Computational Methods for Endangered Languages	Publication Date: Jan 1, 2019
Citations: 10

Affiliation: National Research Council Canada

#Modern Processing #Text Processing Tasks + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We introduce ocreval, a port of the ISRI OCR Evaluation Tools, now with Unicode support. We describe how we upgraded the ISRI OCR Evaluation Tools to support modern text processing tasks. ocreval supports producing character-level and word-level accuracy reports, supporting all characters representable in the UTF-8 character encoding scheme. In addition, we have implemented the Unicode default word boundary specification in order to support word-level accuracy reports for a broad range of writing systems. We argue that character-level and word-level accuracy reports produce confusion matrices that are useful for tasks beyond OCR evaluation—including tasks supporting the study and computational modeling of endangered languages.

Full Text