Abstract

A novel method is proposed in this study to recognize the line structure of table-form documents, e.g. telephone bills and office documents. The line structures of table-form documents are mainly composed of horizontal and vertical line segments. By treating the segment structure as line patterns, the problem of structure recognition is turned out to be the searching of line pattern matching, which can be solved by adopting the technique of relaxation. The proposed method consists of a learning phase and a recognition phase. In the former phase, line structures of various kinds of table-form documents are taken as templates and are extracted through a line extraction algorithm, in which an unique number functioning as a form ID is assigned to each line pattern. In the latter, by adopting the method of relaxation, the line pattern of the testing document is matched to those patterns created in the previous phase and the form ID of the best matching is chosen as the ID of the testing document. To increase the performance of the proposed method, an algorithm was presented to reduce the number of line segments in the matching process. The experimental results reveal the practicability of the proposed methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.