This paper deals with three interrelated topics, linguistic anaphora, multi-modal anaphora and the top-down broadcasting of information using gestural post-holds in multimodal dialogue. Initially, a new solution for definite, pronominal and pro-adverbial anaphora is given based on the idea that an existentially quantified general term may output a definite reference. This approach is extended to multimodal anaphora, where part or all of an anaphor’s meaning is contributed by some sequence of iconic or deictic gestures. Anaphora exploit the semantic potential of their antecedents, they work, as tradition has it, “bottom-up”. An inverse relation, more general than cataphora, and investigated here for the first time, is “broadcasting”, where information is freely distributed top down and input to receiving sites (ports). Anaphora are modelled with the same top-down mechanism and the same applies for coherence relations in dialogue which generally show an anaphora-like behaviour. “Broadcasting” can be used in the context of anaphors, for example, to provide their gestural meaning parts but also for a verb’s multi-modal arguments for referring to a location, a direction or an area. As to multi-modal data, broadcasting is shown to be frequently tied up with gestural post-holds, the holding of a gesture’s stroke information independently of semantically alignable speech. This leads to considering post-holds from a new perspective, stressing their speech-independent function and their relevance for indicating topic-continuity. We show that multi-modal anaphora and especially broadcasting cross single contributions and turns. The data which let us develop these perspectives come from the SaGA (Speech and Gesture Alignment) corpus, a set of route-description dialogues generated in a VR-setting incorporating marker-based eye-tracking facilities. The calculus used to model the anaphora and broadcasting dynamics is the concurrent λΨ-calculus, a recently developed two-tiered machinery using a Ψ-calculus for input-output, data transport and broadcasting. The data transported are in a typed λ-calculus format incorporating Neo-Davidsonian representations; these data can be linguistic, gestural only or multi-modal. Multi-modal informational chunks are modelled as communicating agents sending and receiving information via input-output-channels. They are introduced incrementally on an empirically motivated construction or gesture-plus-construction or gesture only basis. The λΨ-calculus is also used for the multi-modal fusion component unifying gestural and linguistic information; hence, the paper is also a contribution to multi-modal fusion of linguistic and gestural input. Finally, it is shown how the presented algorithm can capture multi-modal coherence relations or a multi-modal anaphora resolution based on PTT ideas.
Read full abstract