Towards efficient human\u2013machine collaboration: effects of gaze-driven feedback and engagement on performance

Nikolina Mitev,Patrick Renner,Thies Pfeiffer,Maria Staudte

doi:10.1186/s41235-018-0148-x

Nikolina Mitev, Patrick Renner + Show 2 more

Open Access

https://doi.org/10.1186/s41235-018-0148-x

Copy DOI

Abstract

Referential success is crucial for collaborative task-solving in shared environments. In face-to-face interactions, humans, therefore, exploit speech, gesture, and gaze to identify a specific object. We investigate if and how the gaze behavior of a human interaction partner can be used by a gaze-aware assistance system to improve referential success. Specifically, our system describes objects in the real world to a human listener using on-the-fly speech generation. It continuously interprets listener gaze and implements alternative strategies to react to this implicit feedback. We used this system to investigate an optimal strategy for task performance: providing an unambiguous, longer instruction right from the beginning, or starting with a shorter, yet ambiguous instruction. Further, the system provides gaze-driven feedback, which could be either underspecified (“No, not that one!”) or contrastive (“Further left!”). As expected, our results show that ambiguous instructions followed by underspecified feedback are not beneficial for task performance, whereas contrastive feedback results in faster interactions. Interestingly, this approach even outperforms unambiguous instructions (manipulation between subjects). However, when the system alternates between underspecified and contrastive feedback to initially ambiguous descriptions in an interleaved manner (within subjects), task performance is similar for both approaches. This suggests that listeners engage more intensely with the system when they can expect it to be cooperative. This, rather than the actual informativity of the spoken feedback, may determine the efficiency of information uptake and performance.

Highlights

In situated collaboration, spoken natural language is often used to refer to task-relevant objects in the form of installments
There are two remaining questions that we address in the present paper: (1) Can the successful use of listener gaze be replicated in real environments, which are much more complex to handle technically? (2) Can gaze-aware Natural language generation (NLG) be used to generate adaptive installments that provide references both incrementally and in the form of contrastive feedback? we present a NLG system that monitors the gaze of the human listener and provides installments only if necessary, that is, if the listener’s gaze indicates wrong reference resolution
In contrast to the data for the subset of ambiguous instructions from Experiment 1, there was no significant difference in performance for the two approaches in Experiment 2

Summary

Introduction

Spoken natural language is often used to refer to task-relevant objects in the form of installments. Human speakers may produce installments without planning an entire unambiguous utterance. This effect is increased when they are under time pressure (Striegnitz et al 2012). Speakers can quickly adapt to changes in the surroundings and in particular to the listeners’ feedback and actions. As shown by Zarrieß and Schlangen (2016), an artificial speaker can use installments to generate referring expressions effectively. This was considered intuitive and enhanced the identification of real objects depicted in static images

Methods

Results

Discussion

Conclusion