Distinguishing text/non-text natural images with Multi-Dimensional Recurrent Neural Networks

Pengyuan Lyu Pengyuan Lyu,Xiang Bai,Baoguang Shi Baoguang Shi,Chengquan Zhang Chengquan Zhang

doi:10.1109/icpr.2016.7900256

Abstract

In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks into account. The network is composed of a Convolutional Neural Network (CNN) and a Multi-Dimensional Recurrent Neural Network (MDRNN). The CNN extracts rich and high-level image representation, while the MDRNN analyzes dependencies along multiple directions and produces block-level predictions. By evaluating CMDRNN on a public dataset, we observe improvements over prior arts in terms of both speed and accuracy.

Full Text