Abstract

Document skew is often introduced during the capturing process of the document image processing pipeline and may seriously affect the performance of subsequent stages of segmentation and recognition. Skew detection is often accomplished with the use of horizontal projections, while recently, a new approach that is based on vertical projections has been introduced. In this paper, we use the technique of minimum bounding box area in order to combine a horizontal with a new reinforced vertical projection profile method. We are motivated by the fact that the horizontal and the novel vertical projection profiles are found to be complementary to each other. We claim that the proposed approach has more accurate performance compared with other state-of-the-art skew detection algorithms; it deals with all the drawbacks of the projection profile methods; it is more noise and warp resistant and gives accurate results for any kind of printed document image. For these reasons, it can be efficiently applied to historical machine printed or multicolumn documents, documents with figures and tables, while it is robust for any kind of script. Extended experimental results on two databases in different skew angle range, with representative printed documents of all kinds, as well as printed documents of two historical books, prove the efficiency of the proposed approach. There is also a comparison with commercial products in several cases where the contribution of the proposed algorithm is demonstrated at optical character recognition level. Moreover, an analysis of the accuracy performance of the main elements of the proposed technique is also performed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.