Optimal feature and classifier selection for text region classification in natural scene images using Weka tool

Rituraj Soni,Bijendra Kumar,Satish Chand

doi:10.1007/s11042-019-07998-z

Abstract

The problem of text detection and localization in scene images has always been challenging for the researchers over the years due to diversities present in these images. This diversity includes variation in fonts, size, color, different backgrounds, etc. The textual content in such images can be helpful for humans in many different domains like visually impaired people, scene understanding, intelligent navigation, etc. The natural scene contains some non-text objects along with relevant text objects, and it is necessary to classify them appropriately & accurately to increase the performance of the detection and localization method. The classification of text regions in scene images depends on the selection of optimal features and optimal classifier. This work contributes to finding both the optimal feature set and the optimal classifier with the help of weka tool. In this paper, first, we detect the possible text regions with the help of the improved MSER algorithm; then, we extract 11 features on these potential text regions. From these 11 features, we choose an optimal feature set for discrimination between text and non-text components with the help of the CfsSubsetEval and BFS parameter of the Weka Tool. We trained several classifiers using these optimal features with the help of Weka tool on the ICDAR 2013 training set. The performance of these classifiers is compared empirically based on the classification accuracy obtained using Weka tool. Based on this empirical estimation, Naive Bayes Classifier with the highest accuracy of 92.5% is proposed as an optimal choice for classification purpose.

Full Text