A study in responsiveness in spoken dialog

Nigel Ward,Wataru Tsukahara

doi:10.1016/s1071-5819(03)00085-5

Abstract

The future of human–computer interfaces may include systems which are human-like in abilities and behavior. One particularly interesting aspect of human-to-human communication is the ability of some conversation partners to sensitively pick up on the nuances of the other's utterances, as they shift from moment to moment, and to use this information to subtly adjust responses to express interest, supportiveness, sympathy and the like. This paper reports a model of this ability in the context of a spoken dialog system for a tutoring-like interaction. The system used information about the user's internal state—such as feelings of confidence, confusion, pleasure and dependency—as inferred from the prosody of his utterances and the context, and used this information to select the most appropriate acknowledgement form at each moment. Although straight-forward rating reveals no significant preference for a system with this ability, a clear preference was found when users rated the system after listening to a recording of their interaction with it. This suggests that human-like, real-time sensitivity can be of value in interfaces. The paper further discusses ways to discover and quantify such rules of social interaction, using corpus-based analysis, developer intuitions and feedback from naive judges; and further suggests that the technique of “evaluation after re-listening” is useful for evaluating spoken dialog systems which operate at near-human levels of performance.

Full Text