Visual language model for keyword spotting on historical mongolian document images

Hongxi Wei,Guanglai Gao

doi:10.1109/ccdc.2017.7978797

Abstract

The Bag-of-Visual-Words (BoVW) approach has been attracted some attention in the field of keyword spotting. However, the BoVW approach discards the spatial relations of the visual words. Therefore, a visual language model is integrated into the BoVW framework in this study so as to add the spatial information. To accomplish the process of keyword spotting, two well-known retrieval schemes, including query likelihood model and KL divergence, have been adopted. The experimental results show that the visual language model can significantly improve the performance of keyword spotting on a collection of historical Mongolian document images than the original BoVW approach. Meanwhile, the influence of different codebook sizes on the performance has been analyzed in this paper. And the best appropriate size of the codebook has been determined.

Full Text