Towards Succinct and Relevant Image Descriptions

Desmond Elliott

doi:10.3115/v1/w14-5417

Abstract

What does it mean to produce a good description of an image? Is a description good because it correctly identifies all of the objects in the image, because it describes the interesting attributes of the objects, or because it is short, yet informative? Grice’s Cooperative Principle, stated as “Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged” (Grice, 1975), alongside other ideas of pragmatics in communication, have proven useful in thinking about language generation (Hovy, 1987; McKeown et al., 1995). The Cooperative Principle provides one possible framework for thinking about the generation and evaluation of image descriptions.1 The immediate question is whether automatic image description is within the scope of the Cooperative Principle. Consider the task of searching for images using natural language, where the purpose of the exchange is for the user to quickly and accurately find images that match their information needs. In this scenario, the user formulates a complete sentence query to express their needs, e.g. A sheepdog chasing sheep in a field, and initiates an exchange with the system in the form of a sequence of one-shot conversations. In this exchange, both participants can describe images in natural language, and a successful outcome relies on each participant succinctly and correctly expressing their beliefs about the images. It follows from this that we can think of image description as facilitating communication between people and computers, and thus take advantage of the Principle’s maxims of Quantity, Quality, Relevance, and Manner in guiding the development and evaluation of automatic image description models. An overview of the image description literature from the perspective of Grice’s maxims can be found in Table 1. The most apparent ommission is the lack of research devoted to generating minimally informative descriptions: the maxim of Quantity. Attending to this maxim will become increasingly important as the quality and coverage of object, attribute, and scene detectors increases. It would be undesirable to develop models that describe every detected object in an image because that would be likely to violate the maxim of Quantity (Spain and Perona, 2010). Similarly, if it is possible to associate an accurate attribute with each object in the image, it will be important to be sparing in the application of those attributes: is it relevant to describe “furry” sheep when there are no sheared sheep in an image? How should image description models be evaluated with respect to the maxims of the Cooperative Principle? So far model evaulation has focused on automatic text-based measures, such as Unigram BLEU and human judgements of semantic correctness (see Hodosh et al. (2013) for discussion of framing image description as a ranking task, and Elliott and Keller (2014) for a correlation analysis of text-based measures against human judgements). The semantic correctness judgements task typically present a variant of “Rate the relevance of the description for this image”, which only evaluates the description visa-vis the maxim of Relevance. One exception is the study of Mitchell et al. (2012), in which judgements about the ordering of noun phrases (the maxim of Manner) were also collected. The importance of being able to evaluate according to multiple maxims becomes clearer as computer vision becomes more accurate. It seems intuitive that a model that describes and relates every object in the image could be characterised as generating Relevant and Quality descriptions, but not necessarily descriptions of

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Succinct and Relevant Image Descriptions

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2014
Citations: 10	License type: cc-by

Similar Papers

Supervised Deep Learning Techniques for Image Description: A Systematic Review
Marco López-Sánchez ... Betania Hernández-Ocaña
Entropy | VOL. 25
Marco López-Sánchez, et. al.Marco López-Sánchez ... Betania Hernández-Ocaña
23 Mar 2023
Entropy | VOL. 25

Image description software generation based on data mining and computer vision
Tao Wang ... Xiaolin Zhu
Measurement: Sensors | VOL. 33
Tao Wang, et. al.Tao Wang ... Xiaolin Zhu
21 May 2024
Measurement: Sensors | VOL. 33

Automatic image description by using word-level features
Shingo Horiuchi ... Hirotaka Moriguchi
-
Shingo Horiuchi, et. al.Shingo Horiuchi ... Hirotaka Moriguchi
17 Aug 2013
17 Aug 2013

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization
Sreela S R ... Sumam Mary Idicula
Information | VOL. 10
Sreela S R, et. al.Sreela S R ... Sumam Mary Idicula
15 Nov 2019
Information | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Succinct and Relevant Image Descriptions

Abstract

Talk to us

Similar Papers