Building a Practical Multimodal System with a Multimodal Fusion Module

Yong Sun,Fang Chen,Yu (David) Shi,Vera Chung

doi:10.1007/978-3-642-02577-8_11

Abstract

A multimodal system is a system equipped with a multimodal interface through which a user can interact with the system by using his/her natural communication modalities, such as speech, gesture, eye gaze, etc. To understand a user's intension, multimodal input fusion, a critical component of a multimodal interface, integrates a user's multimodal inputs and finds the combined semantic interpretation of them. As powerful, yet affordable input and output technologies becoming available, such as speech recognition and eye tracking, it becomes possible to attach recognition technologies to existing applications with a multimodal input fusion module; therefore, a practical multimodal system can be built. This paper documents our experience about building a practical multimodal system with our multimodal input fusion technology. The pilot study has been conducted over the multimodal system. By outlining observations from the pilot study, the implications on multimodal interface design are laid out.

Full Text