Abstract

Seamless integration of information from digital and paper documents is crucial for efficient knowledge management. One convenient way to achieve this is to digitize a document from a natural image. This requires precise localization of the document in the image. Several methods have been proposed to solve this problem but they rely on traditional image processing techniques which are not robust to extreme viewpoint and background variations. Deep Convolutional Neural Networks (CNNs), on the other hand, have shown to be extremely robust to variations in background and viewpoint in object detection and classification tasks. Inspired by their robustness and generality, we propose a novel CNN based method to accurately localize documents in real-time. We model localization problem as a key point detection problem. The four corners of the documents are jointly predicted by a Deep Convolutional Neural Network. We then refine our prediction using a novel recursive application of a CNN. Performance of the system is evaluated on ICDAR 2015 SmartDoc Competition 1 dataset. The results are comparable to state of the art on simple backgrounds and improve the state of the art to 94% from the previous 86% on the complex background. Code, dataset, and models are available at: https://github.com/KhurramJaved96/Recursive-CNNs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.