Abstract

Extraction of filled-in information from document images in the presence of template poses challenges due to geometrical distortion. Filled-in document image consists of null background, general information foreground and vital information imposed layer. Template document image consists of null background and general information foreground layer. In this paper a novel document image registration technique has been proposed to extract imposed layer from input document image. A convex polygon is constructed around the content of the input and the template image using convex hull. The vertices of the convex polygons of input and template are paired based on minimum Euclidean distance. Each vertex of the input convex polygon is subjected to transformation for the permutable combinations of rotation and scaling. Translation is handled by tight crop. For every transformation of the input vertices, Minimum Hausdorff distance (MHD) is computed. Minimum Hausdorff distance identifies the rotation and scaling values by which the input image should be transformed to align it to the template. Since transformation is an estimation process, the components in the input image do not overlay exactly on the components in the template, therefore connected component technique is applied to extract contour boxes at word level to identify partially overlapping components. Geometrical features such as density, area and degree of overlapping are extracted and compared between partially overlapping components to identify and eliminate components common to input image and template image. The residue constitutes imposed layer. Experimental results indicate the efficacy of the proposed model with computational complexity. Experiment has been conducted on variety of filled-in forms, applications and bank cheques. Data sets have been generated as test sets for comparative analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.