Multimodal interaction is a transformative human-computer interaction (HCI) approach that allows users to interact with systems through various communication channels such as speech, gesture, touch, and gaze. With advancements in sensor technology and machine learning (ML), multimodal systems are becoming increasingly important in various applications, including virtual assistants, intelligent environments, healthcare, and accessibility technologies. This survey concisely overviews recent advancements in multimodal interaction, interfaces, and communication. It delves into integrating different input and output modalities, focusing on critical technologies and essential considerations in multimodal fusion, including temporal synchronization and decision-level integration. Furthermore, the survey explores the challenges of developing context-aware, adaptive systems that provide seamless and intuitive user experiences. Lastly, by examining current methodologies and trends, this study underscores the potential of multimodal systems and sheds light on future research directions.
Read full abstract