AN ALGORITHM FOR MATCHING OCR-GENERATED TEXT STRINGS

Stephen V Rice,Junichi Kanai,Thomas A Nartker

doi:10.1142/s0218001494000632

AN ALGORITHM FOR MATCHING OCR-GENERATED TEXT STRINGS

Stephen V Rice, Junichi Kanai + Show 1 more

https://doi.org/10.1142/s0218001494000632

Copy DOI

Journal: International Journal of Pattern Recognition and Artificial Intelligence	Publication Date: Oct 1, 1994
Citations: 10

Affiliation: University of Nevada Reno

#Optical Character Recognition #Nn Log + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OCR devices can be quickly matched by detecting long common substrings. An algorithm for matching an arbitrary number of strings based on this principle is presented. Although its worst-case performance is O(Nn2), its performance in practice has been observed to be O(Nn log n), where n is the length of a string. This algorithm has been successfully used to study OCR errors, to determine the accuracy of OCR devices, and to implement a voting algorithm.

Full Text