Context information from search engines for document recognition

Michael Donoser,Silke Wagner,Horst Bischof

doi:10.1016/j.patrec.2009.10.003

Abstract

In this work we propose the use of contextual information provided by web search engine queries for improving text recognition performance. We first describe a framework for automated text recognition from images. It is based on detecting text areas in images by analysis of Maximally Stable Extremal Regions (MSERs) and recognizing characters by simple template matching. The main emphasis of the paper is on introducing a novel method for exploiting contextual information to improve the obtained recognition results. We propose to analyze the results of web search engine queries on two levels of detail (word and sentence level) which both allow to significantly improve the overall text recognition performance. Experimental evaluations on reference data sets prove that dictionary based methods are outperformed and that even based on a low-quality single character recognition method the proposed web search engine extension enables reasonable text recognition results. This work received the “Best Scientific Paper Award” at the International Conference on Pattern Recognition (ICPR), 2008 ( Donoser et al., 2008).

Full Text