Bringing semantics into word image representation

Praveen Krishnan,C.V Jawahar

doi:10.1016/j.patcog.2020.107542

Abstract

The shift from one-hot to distributed representation, popularly referred to as word embedding has changed the landscape of natural language processing (nlp) and information retrieval (ir) communities. In the domain of document images, we have always appreciated the need for learning a holistic word image representation which is popularly used for the task of word spotting. The representations proposed for word spotting is different from word embedding in text since the later captures the semantic aspects of the word which is a crucial ingredient to numerous nlp and ir tasks. In this work, we attempt to encode the notion of semantics into word image representation by bringing the advancements from the textual domain. We propose two novel forms of representations where the first form is designed to be inflection invariant by focusing on the approximate linguistic root of the word, while the second form is built along the lines of recent textual word embedding techniques such as Word2Vec. We observe that such representations are useful for both traditional word spotting and also enrich the search results by accounting the semantic nature of the task. We conduct our experiments on the challenging document images taken from historical-modern collections, handwritten-printed domains, and Latin-Indic scripts. For the purpose of semantic evaluation, we have prepared a large synthetic word image dataset and report interesting results for the standard semantic evaluation metrics such as word analogy and word similarity.

Full Text