Keyword Spotting on Korean Document Images by Matching the Keyword Image

Soo Hyung Kim,Sang Cheol Park,Chang Bu Jeong,Hyuk Ro Park,Ji Soo Kim,Guee Sang Lee

doi:10.1007/11599517_18

Abstract

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.

Full Text