Abstract

Automatic text recognition in document images is an important task in many real-world applications. Several systems have been proposed to accomplish this task. However, a little attention has been given to document images obtained by mobile phones. To meet this need, we propose a new system that integrates preprocessing, features extraction and classification in order to recognize text contained in the document images acquired by a smartphone. The preprocessing phase is applied to locate the text region, and then segment that region into text line images. In the second phase, a sliding window divides the text-line image into a sequence of frames; afterwards a deep convolutional neural network (CNN) model is used to extract features from each frame. Finally, an architecture that combines the bidirectional recurrent neural network (RNN), the gated recurrent units (GRU) block and the connectionist temporal classification (CTC) layer is explored to ensure the classification phase. The proposed system has been tested on the ICDAR2015 Smartphone document OCR dataset and the experimental results show that the proposed system is capable to achieve promising recognition rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call