SWT voting-based color reduction for text detection in natural scene images

Andrej Ikica,Peter Peer

doi:10.1186/1687-6180-2013-95

Abstract

In this article, we propose a novel stroke width transform (SWT) voting-based color reduction method for detecting text in natural scene images. Unlike other text detection approaches that mostly rely on either text structure or color, the proposed method combines both by supervising text-oriented color reduction process with additional SWT information. SWT pixels mapped to color space vote in favor of the color they correspond to. Colors receiving high SWT vote most likely belong to text areas and are blocked from being mean-shifted away. Literature does not explicitly address SWT search direction issue; thus, we propose an adaptive sub-block method for determining correct SWT direction. Both SWT voting-based color reduction and SWT direction determination methods are evaluated on binary (text/non-text) images obtained from a challenging Computer Vision Lab optical character recognition database. SWT voting-based color reduction method outperforms the state-of-the-art text-oriented color reduction approach.

Highlights

Text detection in natural scene images is a very challenging task, far from being completely solved
5 Experimental results Since the popular text detection datasets such as International Conference on Document Analysis and Recognition (ICDAR) [15,16] and Street View Text (SVT) [21] are annotated with word rectangles they are inappropriate for evaluating our color reduction method, which covers the first two stages of the text detection flowchart (Figure 1)
We evaluated both stroke width transform (SWT) direction determination and SWT voting-based color reduction methods on CVL OCR BIN DB

Summary

Introduction

Text detection in natural scene images is a very challenging task, far from being completely solved. Uneven illumination, and presence of almost unlimited number of text fonts, sizes, and orientations pose great difficulties even to state-of-the-art text detection methods. Unlike document images, where text is usually superimposed on either blank or complex backgrounds and is more distinct [1,2,3], natural scene images deal with scene text, which is already a part of the captured scene and is often much less distinct. State-of-the-art literature distinguishes between two major text detection approaches: texture-based and region-based. Texture-based methods [4,5,6,7] scan images at different scales, inspect area under the sliding window

Objectives

Results

Conclusion