Building referring expression corpora with and without feedback

Danillo Da Silva Rocha,Ivandré Paraboni

doi:10.1007/s10579-020-09497-2

Abstract

The design of data collection experiments involving human participants is a common task in Referring Expression Generation (REG) and related fields. Many (or most) REG data collection tasks are implemented by making use of a human–computer (e.g., web-based) communicative setting, in which participants do not have any particular addressee in mind and do not receive any feedback regarding the appropriateness (e.g., uniqueness) of the descriptions that they produce. Others, at a possibly higher cost, make use of participant pairs engaged in some form of dialogue in which hearers may provide feedback allowing speakers to rephrase ambiguous or otherwise ill-formed descriptions. Leaving the issue of cost aside, however, it remains unclear whether the two methods elicit similar referring expressions for the purpose of REG research. To shed light on this issue, this paper presents a REG corpus built under three experimental conditions: a standard human–computer (or web-based) setting in which no feedback is available to the speaker, and two settings in which feedback regarding the appropriateness of the description may be provided either by an automated parsing tool or by a second participant at the receiving end of the communication. The corpus contains fully annotated descriptions in two domains—simple geometric objects and realistic human face images—and it is provided as a resource for the training and testing of REG algorithms in these communicative settings.

Full Text