Attributed point matching for automatic groundtruth generation

Doe-Wan Kim,Tapas Kanungo

doi:10.1007/s10032-002-0083-7

Abstract

Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition (OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically registered the original image to the rescanned one using four corner points and then transformed the original groundtruth using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth for microfilmed and FAXed versions of the University of Washington dataset documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Attributed point matching for automatic groundtruth generation

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition

Lead the way for us

Journal: International Journal on Document Analysis and Recognition	Publication Date: Nov 1, 2002
Citations: 30

Similar Papers

Digital Image Text Recognition Using Machine Learning Algorithms
Chaitanya U ... Maneesha Dodda
International Journal for Research in Applied Science and Engineering Technology | VOL. 11
Chaitanya U, et. al.Chaitanya U ... Maneesha Dodda
30 Jun 2023
International Journal for Research in Applied Science and Engineering Technology | VOL. 11

Sophisticated and modernized library running system with OCR algorithm using IoT
D Karthikeyan ... K Selvakumar
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24
D Karthikeyan, et. al.D Karthikeyan ... K Selvakumar
01 Dec 2021
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 24

OCTess: AN OPTICAL CHARACTER RECOGNITION ALGORITHM FOR AUTOMATED DATA EXTRACTION OF SPECTRAL DOMAIN OPTICAL COHERENCE TOMOGRAPHY REPORTS.
Michael Balas ... Marko M Popovic
Retina | VOL. 44
Michael Balas, et. al.Michael Balas ... Marko M Popovic
06 Nov 2023
Retina | VOL. 44

Classifying Promotion Images Using Optical Character Recognition and Naïve Bayes Classifier
Hubert ... Derwin Suhartono
Procedia Computer Science | VOL. 179
Hubert, et. al. Hubert ... Derwin Suhartono
01 Jan 2020
Procedia Computer Science | VOL. 179

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attributed point matching for automatic groundtruth generation

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition