Abstract

Voice assistants are finding adoption because of their ease and intuitiveness of use. While voice has been the dominant mode of interaction of humans with voice assistants, some embodiments of voice assistants also provide alternative modalities for interaction, popularly, via visual or touch interface. In this paper, the end-to-end working of a multi-modal voice assistant is provided, followed by a deep dive into the challenges associated specifically with multimodal voice assistants. This is followed by mitigation strategies for the challenges arising out of multimodal interaction scenarios. Next, architectural and design guidelines are provided that can provide a seamless user experience. Finally, future research areas have been identified

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call