Abstract

This paper proposes an efficient method for rectifying distorted document images via deep learning, ultimately improving the legibility of graphics and text in documents. The framework comprises two interconnected UNets, working in tandem to predict a 3D coordinate map and a forward map for the input distorted document image, respectively. At the beginning of the process, a page mask is predicted and used as input to both U-Nets to help improve the performance of their tasks. In the last step, the predicted forward map is transformed into a corresponding backward map, which is utilized to rectify the distorted image. The experimental results not only reveal that the predicted page masks and 3D coordinate maps significantly enhance the accuracy of predicting forward maps for subsequent rectification but also demonstrate satisfactory results both globally and locally.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.