A study for high performance character extraction from color scene images

Keiichiro Shirai,Hiroaki Yamamoto,Masanori Wakabayashi,Masayuki Okamoto

doi:10.1109/das.2008.57

Abstract

This paper describes a method for extracting character strings from scene images. Most characters on scene images appear with the same color and font size at every word or text line. In our algorithm, a scene image is divided into several blocks based on edges in the color space at first. Then the blobs, which consist of similar color pixels, are extracted by a clustering in a color space for each block. Although these blobs are correspond to characters or background patterns, after connecting them using these aspect ratios and pitches, SVM (Support Vector Machine) on several textural features of these blobs will classify each connected blob into character or background patterns. Testing with 251 images from ICDAR 2003 Text Locating Competition shows effectiveness of our algorithm.

Full Text