WIP: Integrating text and graphics design for adaptive information presentation

Wolfgang Wahlster,Winfried Graf,Elisabeth André,Hans-Jürgen Profitlich,Thomas Rist,Wolfgang Finkler,Anne Schauder

doi:10.1007/3-540-55399-1_23

Abstract

When explaining how to use a technical device humans will often utilize a combination of language and graphics. It is a rare instruction manual that does not contain illustrations. Multimodal presentation systems combining natural language and graphics take advantage of both the individual strength of each communication mode and the fact that both modes can be employed in parallel. It is an important goal of this research not simply to merge the verbalization results of a natural language generator and the visualization results of a knowledge-based graphics design component, but to carefully coordinate natural language and graphics in such a way that they generate a multiplicative improvement in communication capabilities. Allowing all of the modalities to refer to and depend upon each other is a key to the richness of multimodal communication. In the WIP system that plans and coordinates multimodal presentations in which all material is generated by the system, we have integrated multiple AI components such as planning, knowledge representation, natural language generation, and graphics generation. The current prototype of WIP generates multimodal explanations and instructions for assembling, using, maintaining or repairing physical devices. As we try to substantiate our results with cross-language and cross-application evidence WIP is currently able to generate simple German or English explanations for using an espresso-machine or assembling a lawn-mower. In WIP we combined and extended only formalisms that have reached a certain level of maturity: in particular, terminological logics, RST-based planning, constraint processing techniques, and tree adjoining grammars with feature unification. One of the important insights we gained from building the WIP system is that it is actually possible to extend and adapt many of the fundamental concepts developed to date in AI and computational linguistics for the generation of natural language in such a way that they become useful for the generation of graphics and text-picture combinations as well. This means that an interesting methodological transfer from the area of natural language processing to a much broader computational model of multimodal communication seems possible. In particular, semantic and pragmatic concepts like coherence, focus, communicative act, discourse model, reference, implicature, anaphora, or scope ambiguity take an extended meaning in the context of text-picture combinations. A basic principle underlying the WIP model is that the various constituents of a multimodal presentation should be generated from a common representation of what is to be conveyed. This raises the question of how to decompose a given communicative goal into subgoals to be realized by the modespecific generators, so that they complement each other. To address this problem, we explored computational models of the cognitive decision processes coping with questions such as what should go into text, what should go into graphics, and which kinds of links between the verbal and non-verbal fragments are necessary. The task of the knowledge-based presentation system WIP is the context-sensitive generation of a variety of multimodal documents from an input including a presentation goal. The presentation goal is a formal representation of the communicative intent specified by the back-end application system. WIP is a highly adaptive interface, since all of its output is generated on the fly and customized for the intended target audience and situation. The quest for adaptation is based on the fact that is impossible to anticipate the needs and requirements of each potential user in an infinite number of presentation situations. We view the design of multimodal presentations including text and graphics design as a subarea of general communication design. We approximate the fact that communication is always situated by introducing generation parameters (see Fig. 1) in our model. The current system includes a choice between user

Full Text