Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System

Zhimin Wang,Haofei Wang,Feng Lu,Huangyue Yu

doi:10.1109/thms.2021.3097973

Zhimin Wang, Haofei Wang + Show 2 more

https://doi.org/10.1109/thms.2021.3097973

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Multimodal interaction has become a recent research focus since it offers better user experience in augmented reality (AR) systems. However, most existing works only combine two modalities at a time, e.g., gesture and speech. Multimodal interactive system integrating gaze cue has rarely been investigated. In this article, we propose a multimodal interactive system that integrates gaze, gesture, and speech in a flexibly configurable AR system. Our lightweight head-mounted device supports accurate gaze tracking, hand gesture recognition, and speech recognition simultaneously. The system can be easily configured into various modality combinations, which enables us to investigate the effects of different interaction techniques. We evaluate the efficiency of these modalities using two tasks: the lamp brightness adjustment task and the cube manipulation task. We also collect subjective feedback when using such systems. The experimental results demonstrate that the <i>Gaze+Gesture+Speech</i> modality is superior in terms of efficiency, and the <i>Gesture+Speech</i> modality is more preferred by users. Our system opens the pathway toward a multimodal interactive AR system that enables flexible configuration.

Full Text