Video scene text frames categorization for text detection and recognition

Longfei Qin,Palaiahnakote Shivakumara,Chew Lim Tan,Umapada Pal,Tong Lu

doi:10.1109/icpr.2016.7900241

Abstract

Developing a unified text detection and recognition method is hard for different video types due to varying characteristics in video. This paper proposes a new method for categorizing different types of video text frames, namely, videos containing advertisement, signboard, license plate, front page of book or magazine, street view, and video of general items, for better text detection and recognition rate. We propose symmetry features using gradient vector flow for Canny and Sobel edge images of each input frame to identify candidate edge components. Then for a candidate edge component image, we extract both global and local features using colors from different channels in a new way. Besides, the proposed method extracts statistical and structural features from the spatial distribution of candidate pixels in a multi-scale environment. Lastly, the extracted features are fed to a logistic classifier for categorization. The features extracted locally and globally are tested both separately and altogether in terms of confusion matrix. The performance of the proposed categorization method is evaluated through several text detection and recognition experiments before and after categorization. We noted that the proposed categorization method is very useful in improving text detection and recognition performance.

Full Text