Abstract

Humans use verbal and non-verbal cues to communicate their intent in collaborative tasks. In situated dialogue, speakers typically direct their interlocutor's attention to referent objects using multimodal cues, and references to such entities are resolved in a collaborative nature. In this study we designed a multiparty task where humans teach each other how to assemble furniture, and captured eye-gaze, speech and pointing gestures. We analysed which multimodal cues carry the most information for resolving referring expressions, and report an object saliency classifier that using a multisensory input from speaker and addressee, detects the referent objects during the collaborative task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call