Generating unambiguous and diverse referring expressions

Nikolaos Panagiaris,Emma Hart,Dimitra Gkatzia

doi:10.1016/j.csl.2020.101184

Nikolaos Panagiaris, Emma Hart + Show 1 more

Open Access

https://doi.org/10.1016/j.csl.2020.101184

Copy DOI

Journal: Computer Speech & Language	Publication Date: Dec 31, 2020
Citations: 6	License type: other-oa

Affiliation: Edinburgh Napier University

Abstract

Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differentiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from 76.95% to 81.66% and from 78.10% to 83.33% respectively. While in RefCOCO+ testA we show improvements from 58.85% to 83.33%. Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity.

Full Text