Abstract

In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based methods. In this paper, we aim at improving text-based image retrieval for complex natural language queries by using a semantic parser (Knowledge Parser or K-Parser). From text written in natural language, the K-parser extracts a graphical semantic representation of the objects involved, their properties as well as their relations. We analyze both the image textual captions and the natural language queries with the K-parser. As a technical solution, we leverage RDF in two ways: first, we store the parsed image captions as RDF triples; second, we translate image queries into SPARQL queries. When applied to the Flickr8k dataset with a set of 16 custom queries, we notice that the K-parser exhibits some biases that negatively affect the accuracy of the queries. We propose two techniques to address the weaknesses: (1) we introduce a set of rules to transform the output of K-parser and fix some basic, recurrent parsing mistakes that occur on the captions of Flickr8k; (2) we leverage two popular commonsense knowledge databases, ConceptNet and WordNet, to raise the accuracy of queries on broad concepts. Using those two techniques, we can fix most of the initial retrieval errors, and accurately execute our set of 16 queries on the Flickr8k dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call