Abstract

Referring expressions in multimodal dialogues have different aspects compared to those in language-only dialogues. They often refer to the items signified by either a gesture or visual means. In this article we classify referring expressions into two types (i.e., a deictic reference and an anaphoric reference), and propose two general methods to resolve these referring expressions. One method is a simple mapping algorithm that can find items referred with/without pointing gestures on a screen. The other is the centering algorithm with a dual cache model, to which Walker's centering algorithm is extended for a multimodal dialogue system. The extended algorithm is appropriate for resolving various anaphoric references in a multimodal dialogue. In the experiments, the proposed system correctly resolved 376 out of 405 referring expressions in 40 dialogues (0.54 referring expressions per utterance) showing 92.84 percent correctness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call