AUTNT - A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN

Tauseef Khan,Ayatullah Faruk Mollah

doi:10.1007/s11042-019-08028-8

Abstract

Automated scene text recognition from camera images is considered as a pioneer research area through last few decades. Classification of foreground object components from camera images is an essential step of Text Information Extraction (TIE). Text/Non-text separation from complex document images as well as unstructured natural images is still a challenging task. Although, some works have been reported in this direction, component level standard benchmark datasets for specifically text/non-text classification are not available. In this paper, a new multi-script dataset of text and non-text components have been reported along with multi-purpose ground truth annotations. A novel feature set is also designed on the basis of distance information of medial skeleton points to set benchmark performance on this dataset. Also, a Deep Convolution Neural Network (D-CNN) based automated feature extraction and classification framework is developed for benchmarking purpose. More insight is put forward by conducting separate assessment of current two benchmark methods on component images originated from documents and natural scenes. Experimental results show that classification accuracy is over 94.00% for medial skeleton based feature descriptors and over 96.00% for D-CNN framework on both types of sources, which is pretty impressive in practical scenario.

Full Text