This paper describes the response planning and generation components of the mercury flight reservation system, a mixed-initiative spoken dialogue system that supports both voice-only interaction and multi-modal interaction augmenting spoken inputs with typing or clicking at a displayed Web page. mercury is configured using the Galaxy Communicator architecture ( Seneff, Hurley, Lau, Schmid, & Zue, 1998), where a suite of servers interact via program control mediated by a central hub. Language generation is performed in two steps: response planning, or deep-structure generation, is carried out by the dialogue manager, and is well-integrated with other aspects of dialogue control; control flow is specified by a dialogue control table ( Seneff & Polifroni, 2000a). Response generation, or surface-form generation, is executed by a separate language generation server, under the guidance of a set of recursive generation rules and an associated lexicon ( Baptist & Seneff, 2000). The generation of the textual string for the graphical interface and the marked-up synthesis string for spoken outputs are controlled by a shared set of generation rules ( Seneff & Polifroni, 2000b). Thus there is a direct meaning-to-speech mapping that eliminates the need to analyze linguistic structure for synthesis. To date, we have collected over 25 000 utterances from users interacting with the mercury system. We report here on both the results of user satisfaction studies conducted by the National Institute of Standards and Technology (NIST), and on our own tabulation of a number of different measures of dialogue success.
Read full abstract