Pragmatically Informative Image Captioning with Character-Level Inference

Reuben Cohn-Gordon,Christopher Potts,Noah Goodman

doi:10.18653/v1/n18-2070

Abstract

We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters (“a”, “b”, “c”, ...) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our model outperforms a non-pragmatic baseline as well as a word-level RSA captioner.

Highlights

The success of automatic image captioning (Farhadi et al, 2010; Mitchell et al, 2012; Karpathy and Fei-Fei, 2015; Vinyals et al, 2015) demonstrates compellingly that end-to-end statistical models can align visual information with language
We present a neural image captioning system1 that is a pragmatic speaker as defined by the Rational Speech Acts (RSA) model
Advantage of Incremental RSA We observe that 66% percent of the times in which the S1 caption is referentially successful and the S0

Summary

Introduction

The success of automatic image captioning (Farhadi et al, 2010; Mitchell et al, 2012; Karpathy and Fei-Fei, 2015; Vinyals et al, 2015) demonstrates compellingly that end-to-end statistical models can align visual information with language. The RSA speaker achieves this by reasoning about what is true and about what it’s like to be a listener in this context trying to identify the target This core idea underlies much work in referring expression generation (Dale and Reiter, 1995; Monroe and Potts, 2015; Andreas and Klein, 2016; Monroe et al, 2017) and image captioning (Mao et al, 2016a; Vedantam et al, 2017), but these models do not fully confront the fact that the agents must reason about all possible utterances, which is intractable. We show that such character-level RSA speakers are more effective than literal captioning systems at the task of helping a reader identify the target image among close competitors, and outperform word-level RSA captioners in both efficiency and accuracy

Bayesian Pragmatics for Captioning

Applying Bayesian Pragmatics to a Neural Semantics

Step-Wise Inference

Evaluation

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pragmatically Informative Image Captioning with Character-Level Inference

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 56	License type: cc-by

Similar Papers

A Rate-Distortion view of human pragmatic reasoning.
...
-
, et. al. ...
10 Feb 2021
10 Feb 2021

Reevaluating pragmatic reasoning in language games.
Les Sikos ... Thomas Holtgraves
PloS one | VOL. 16
Les Sikos, et. al.Les Sikos ... Thomas Holtgraves
17 Mar 2021
PloS one | VOL. 16

Are Narrow Focus Exhaustivity Inferences Bayesian Inferences?
Alexander Schreiber ... Edgar Onea
Frontier in Psychology | VOL. 12
Alexander Schreiber, et. al.Alexander Schreiber ... Edgar Onea
04 Aug 2021
Frontier in Psychology | VOL. 12

A semi-supervised deep learning image caption model based on Pseudo Label and N-gram
Cheng Cheng ... Yan Zhu
International Journal of Approximate Reasoning | VOL. 131
Cheng Cheng, et. al.Cheng Cheng ... Yan Zhu
28 Dec 2020
International Journal of Approximate Reasoning | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pragmatically Informative Image Captioning with Character-Level Inference

Abstract

Highlights

Summary

Talk to us

Similar Papers