Visual Saliency Models for Text Detection in Real World.

Renwu Gao,Seiichi Uchida,Volkmar Frinken,Asif Shahab,Faisal Shafait

doi:10.1371/journal.pone.0114539

Renwu Gao, Seiichi Uchida + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0114539

Copy DOI

Abstract

This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts.

Highlights

In our daily life of the real world, we can almost see texts in any place at any time
The second experiment (Fig. 6(a)) is the receiver operating characteristic (ROC)-based performance evaluation of Itti’s visual saliency model with different features. This experiment was done in order to investigate how salient scene texts are for each low level feature
In the case of Itti’s visual saliency maps (c) and Harel’s graph-based visual saliency model (d), scene texts seem to be more salient compared to the non-texts, while the non-texts were not well inhibited

Summary

Introduction

In our daily life of the real world, we can almost see texts in any place at any time. Visual Saliency Models and Scene Text Detection information on what to be obeyed; while shopping, labels display the price and other detail of the products All these indicate that there are many texts in natural scenes. The focus of this paper is to analyze the saliency of texts in natural scenes according to different measures of saliency Scene texts, such as the traffic signal texts and the advertisement texts in the signboards, are considered to convey important information to pedestrians. We believe that texts have some kinds of identity properties (e.g. intensity, color or orientation) compared to their non-text neighbors (the so-called pop-up) This is plausible considering that texts in natural scenes, such as those, are used to communicate important information efficiently to the passengers. For the quantification of this properties, we will use visual saliency models in this paper

Methods

Results

Discussion

Conclusion