Word Level Script Identification of Text in Low Resolution Images of Display Boards Using Wavelet Features

S A Angadi,M M Kodabagi

doi:10.1007/978-81-322-0740-5_26

Abstract

Automated systems for understanding low resolution images of display boards are facilitating several new applications such as blind assistants, tour guide systems, location aware systems and many more. Script identification at character/word level is one of the very important pre-processing steps for development of such systems prior to further image analysis. In this paper, a new approach for word level script identification of text in low resolution images of display boards is presented. The proposed methodology uses horizontal run statistics and wavelet features for distinguishing 5 Indian scripts namely; Hindi, Kannada, English, Malyalam and Tamil. The method works in two phases; In the first phase, the wavelet transform based texture features such as zone wise wavelet energy features, vertical run statistical features of wavelet coefficients and wavelet log mean deviation features of decomposed energy bands at 2 levels are obtained from training word images and knowledge bases are constructed, one for each script/language under study. The second phase is testing, in which test word image is processed to obtain horizontal run statistics to determine whether it belongs to Hindi script. Otherwise, a newly defined descriminant function that measures the city block distance between test sample and pre-constructed knowledge base of every script is used to identify the script of the test sample. The proposed method is robust and insensitive to the variations in size and style of font, number of characters, thickness and spacing between characters, noise, and other degradations. The proposed method achieves an overall identification accuracy of 89.7% and individual identification accuracy of 92% for Kannada Script, 97.67% for English, 82.5% for Malyalam and 87% for Tamil Script.

Full Text