Chinese Text Detection and Recognition in Natural Scene Using HOG and SVM

Boran Yu,Hongjie Wan

doi:10.12783/dtcse/itms2016/9461

Abstract

It remains an important, yet challenging problem to detect and recognize text from natural scene images. Since earlier this year a number of methods have been proposed for reading text from scene images. But all of them are adapted for English alphabets, and not suitable for Chinese characters, which present unique challenges including highly versatile fonts, complex background and uneven illumination, a huge number of different characters, unconnected strokes within a character. In this paper we design a method tailored for reading Chinese characters from natural scenes. During the phase of character detection, we employ MSER (Maximally Stable Extremal Region) method to extract candidate characters, and then integrate extracted strokes through mathematical morphology computation. Based on the attributes of Chinese characters, we also lay down heuristic rules that distinguish text and non-text to screen the region of a candidate character. In this method, we describe the features of characters with HOG descriptor and accurately use SVM (Support Vector Machine) according to classification. And the positive region is the region covering the text. At the stage of text recognition, we use KNN (K-Nearest Neighbors) as the classifier. We test this method on 400 natural scene images containing Chinese characters collected from different sources. And the results have successfully validated the efficacy of our approach.

Full Text