Abstract

In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks into account. The network is composed of a Convolutional Neural Network (CNN) and a Multi-Dimensional Recurrent Neural Network (MDRNN). The CNN extracts rich and high-level image representation, while the MDRNN analyzes dependencies along multiple directions and produces block-level predictions. By evaluating CMDRNN on a public dataset, we observe improvements over prior arts in terms of both speed and accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.