Abstract

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.