Referring Expression Generation Research Articles

Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (REG), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through symbolic information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in visual REG has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between positive and negative semantic forces exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.

Read full abstract

This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene. One common theoretical approach to this problem is to model the task as a two-agent cooperative scheme in which a ‘speaker’ agent would generate the expression that best describes a targeted area and a ‘listener’ agent would identify the target. Several recent REG systems have used deep learning approaches to represent the speaker/listener agents. The Rational Speech Act framework (RSA), a Bayesian approach to pragmatics that can predict human linguistic behavior quite accurately, has been shown to generate high quality and explainable expressions on toy datasets involving simple visual scenes. Its application to large scale problems, however, remains largely unexplored. This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes in a multi-step process with the aim of generating better-explained expressions. We carry out experiments on the RefCOCO and RefCOCO+ datasets and compare our approach with other endto-end deep learning approaches as well as a variation of RSA to highlight our key contribution. Experimental results show that while achieving lower accuracy than SOTA deep learning methods, our approach outperforms similar RSA approach in human comprehension and has an advantage over end-to-end deep learning under limited data scenario. Lastly, we provide a detailed analysis on the expression generation process with concrete examples, thus providing a systematic view on error types and deficiencies in the generation process and identifying possible areas for future improvements.

Read full abstract

Referring Expression Generation Research Articles

Related Topics

Articles published on Referring Expression Generation

Unified Referring Expression Generation for Bounding Boxes and Segmentations

Rethinking symbolic and visual context in Referring Expression Generation.

A Proposal-Free One-Stage Framework for Referring Expression Comprehension and Generation via Dense Cross-Attention

Perspective-Corrected Spatial Referring Expression Generation for Human–Robot Interaction

Neural referential form selection: Generalisability and interpretability

Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach

Formal concept analysis for the generation of plural referring expressions

Visual question answering based on local-scene-aware referring expression generation

Generating unambiguous and diverse referring expressions

Building referring expression corpora with and without feedback

Attribute-Guided Attention for Referring Expression Generation and Comprehension.

Dempster-Shafer theoretic resolution of referential ambiguity

From image to language and back again

Visual Complexity and Its Effects on Referring Expression Generation.

Effects of Cognitive Effort on the Resolution of Overspecified Descriptions

Generating natural language descriptions using speaker-dependent information

Stars2: a corpus of object descriptions in a visual domain

Production of referring expressions in Arabic

Overspecified references: An experiment on lexical acquisition in a virtual environment

Collaborative Models for Referring Expression Generation in Situated Dialogue

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Referring Expression Generation Research Articles

Related Topics

Articles published on Referring Expression Generation

Unified Referring Expression Generation for Bounding Boxes and Segmentations

Rethinking symbolic and visual context in Referring Expression Generation.

A Proposal-Free One-Stage Framework for Referring Expression Comprehension and Generation via Dense Cross-Attention

Perspective-Corrected Spatial Referring Expression Generation for Human–Robot Interaction

Neural referential form selection: Generalisability and interpretability

Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach

Formal concept analysis for the generation of plural referring expressions

Visual question answering based on local-scene-aware referring expression generation

Generating unambiguous and diverse referring expressions

Building referring expression corpora with and without feedback

Attribute-Guided Attention for Referring Expression Generation and Comprehension.

Dempster-Shafer theoretic resolution of referential ambiguity

From image to language and back again

Visual Complexity and Its Effects on Referring Expression Generation.

Effects of Cognitive Effort on the Resolution of Overspecified Descriptions

Generating natural language descriptions using speaker-dependent information

Stars2: a corpus of object descriptions in a visual domain

Production of referring expressions in Arabic

Overspecified references: An experiment on lexical acquisition in a virtual environment

Collaborative Models for Referring Expression Generation in Situated Dialogue