Real-Time Document Localization in Natural Images by Recursive Application of a CNN

Khurram Javed,Faisal Shafait

doi:10.1109/icdar.2017.26

Abstract

Seamless integration of information from digital and paper documents is crucial for efficient knowledge management. One convenient way to achieve this is to digitize a document from a natural image. This requires precise localization of the document in the image. Several methods have been proposed to solve this problem but they rely on traditional image processing techniques which are not robust to extreme viewpoint and background variations. Deep Convolutional Neural Networks (CNNs), on the other hand, have shown to be extremely robust to variations in background and viewpoint in object detection and classification tasks. Inspired by their robustness and generality, we propose a novel CNN based method to accurately localize documents in real-time. We model localization problem as a key point detection problem. The four corners of the documents are jointly predicted by a Deep Convolutional Neural Network. We then refine our prediction using a novel recursive application of a CNN. Performance of the system is evaluated on ICDAR 2015 SmartDoc Competition 1 dataset. The results are comparable to state of the art on simple backgrounds and improve the state of the art to 94% from the previous 86% on the complex background. Code, dataset, and models are available at: https://github.com/KhurramJaved96/Recursive-CNNs.

Full Text