Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Neelotpal Chakraborty,Soumyadeep Kundu,Ram Sarkar,Sayantan Paul,Subhadip Basu,Ayatullah Faruk Mollah

doi:10.1007/s12652-020-02528-4

Neelotpal Chakraborty, Soumyadeep Kundu + Show 4 more

https://doi.org/10.1007/s12652-020-02528-4

Copy DOI

Abstract

Since the past two decades, detecting text regions in complex natural images has emerged as a problem of great interest for the research fraternity. This is because these regions of interest serve as source of information that can be utilized for various purposes. However, these regions may contain texts in multiple languages. Hence, identifying the corresponding language of a detected scene text becomes important for further information processing. Language identification of the text, captured in a wild, is an extremely challenging research field in the domain of scene text recognition. In this paper, a deep learning-based classifier combination approach is proposed to solve the problem of language identification from multi-lingual scene text images. In this work, a minimalist Convolutional Neural Network architecture is used as the base model. Five variants of an input image—three different channels of RGB color model (i.e. R for red, G for green and B for blue) along with RGB itself, and grayscale image are passed through the base model separately. The outcomes of these five models are combined using the classifier combination approaches based on sum rule and product rule. Performances of the proposed model have been evaluated on some standard datasets like KAIST and MLe2e as well as in-house multi-lingual scent text dataset. From the experimental results, it has been observed that the proposed model outperforms some state-of-the-art methods considered here for comparison.

Full Text