Dependencies of discourse structure on the modality of communication

Philip R Cohen

doi:10.1145/1056663.1056716

Abstract

With the genesis of speech understanding systems, computational linguistics research is becoming less wedded to teletype-based interaction. It would therefore be wise to look for systematic ways in which the discourse structure of voice interaction differs from teletype interaction. Such discoveries could lead to substantially different discourse components for speech understanding systems.Rubin (1980) points out that language experiences should not simply be characterized as oral or written. Rather, there is a set of dimensions along which language experiences such as having a conversation and writing/reading a letter might differ, including: the ability to interact, the sharing of space and time between speaker and hearer, the concreteness of referents, and the use of voice or print.Following Rubin's taxonomy, and influenced by Chapanis et al.'s [1977] communication mode study and Grosz' [1977] task-oriented dialogue work, we conducted a study to explore how the structure of an instruction-giving discourse depends on the communication situation in which it takes place. Twenty-five subjects were videotaped as they instructed twenty-five others in assembling a toy water pump. Five "dialogues" each took place face-to-face, via telephone, teletype, audiotape, and written text. We chose to analyze telephone and teletype dialogues first since results would have direct implications for the design of speech understanding and production systems.Preliminary results indicate that the structure of telephone dialogues is markedly different from that of teletype dialogues. In telephone mode, speakers frequently, explicitly, and often indirectly, request hearers to identify the referents of noun phrases. For example, utterances used indirectly to perform such requests include "there is a NP" and "the NP?". In contrast, teletype "speakers" rarely accomplish the goal of referring in a separate step. Instead, the goals of referring and requesting an assembly action are achieved with one utterance, usually an imperative such as "Insert the green plunger into the large tube with threads on one end".As for computational implications, we are led to conclude that, within the framework of a plan-based theory of speech acts (Perrault and Allen [1980]), referent identification should be treated as a planned action by language production and comprehension systems. That is, by planning to facilitate the healer's plan, producers should design their noun phrases so that hearers can identify the referents. Conversely, comprehenders should reason about what the producer intended to be done with the uttered NP -- find its referent, supply a co-referring NP. etc. Our goal, then is to develop production and comprehension systems capable of reasoning about reference.

Full Text