Representing word image using visual word embeddings and RNN for keyword spotting on historical document images

Hongxi Wei,Guanglai Gao,Hui Zhang

doi:10.1109/icme.2017.8019403

Abstract

Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated. For a word image, the average of embedding vectors of all visual words within the word image is taken as its embedding vector. Moreover, Recurrent Neural Network (RNN) is utilized to encode each word image into embeddings like an auto-encoder. The RNN embeddings and the visual word embeddings are complementary. In this study, all word images are represented by combining visual word embeddings and RNN embeddings. Experimental results show that the proposed representation approach is superior to the traditional BoVW, spatial pyramid matching and latent Dirichlet allocation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Representing word image using visual word embeddings and RNN for keyword spotting on historical document images

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents
Hongxi Wei ... Guanglai Gao
-
Hongxi Wei, et. al.Hongxi Wei ... Guanglai Gao
01 Aug 2018
01 Aug 2018

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images
Hongxi Wei ... Xiangdong Su
-
Hongxi Wei, et. al.Hongxi Wei ... Xiangdong Su
01 Jan 2017
01 Jan 2017

A case study of BoVW for keyword spotting on historical Mongolian document images
Xing Guo ... Xiangdong Su
-
Xing Guo, et. al.Xing Guo ... Xiangdong Su
01 Oct 2016
01 Oct 2016

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images
Hongxi Wei ... Guanglai Gao
-
Hongxi Wei, et. al.Hongxi Wei ... Guanglai Gao
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Representing word image using visual word embeddings and RNN for keyword spotting on historical document images

Abstract

Talk to us

Similar Papers