Abstract

Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated. For a word image, the average of embedding vectors of all visual words within the word image is taken as its embedding vector. Moreover, Recurrent Neural Network (RNN) is utilized to encode each word image into embeddings like an auto-encoder. The RNN embeddings and the visual word embeddings are complementary. In this study, all word images are represented by combining visual word embeddings and RNN embeddings. Experimental results show that the proposed representation approach is superior to the traditional BoVW, spatial pyramid matching and latent Dirichlet allocation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.