Abstract
Multimodal interaction has become a recent research focus since it offers better user experience in augmented reality (AR) systems. However, most existing works only combine two modalities at a time, e.g., gesture and speech. Multimodal interactive system integrating gaze cue has rarely been investigated. In this article, we propose a multimodal interactive system that integrates gaze, gesture, and speech in a flexibly configurable AR system. Our lightweight head-mounted device supports accurate gaze tracking, hand gesture recognition, and speech recognition simultaneously. The system can be easily configured into various modality combinations, which enables us to investigate the effects of different interaction techniques. We evaluate the efficiency of these modalities using two tasks: the lamp brightness adjustment task and the cube manipulation task. We also collect subjective feedback when using such systems. The experimental results demonstrate that the <i>Gaze+Gesture+Speech</i> modality is superior in terms of efficiency, and the <i>Gesture+Speech</i> modality is more preferred by users. Our system opens the pathway toward a multimodal interactive AR system that enables flexible configuration.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have