Multimodal Interaction on the Move

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

This book is a study of around seven hours of naturally occurring video data, recorded by the author in the Italian speaking part of Switzerland. Drawing on the methodology of Conversation Analysis, Gazin analyses instructional sequences of interaction during driving lessons. The temporal constraints of mobility make the driving lessons a rich setting for the investigation of sequence organisation and action constitution. The author identifies different types of actions that compose the unfolding driving and instructing activity, and their turn-constructional features (e.g. different verb forms for specific instructions). The analyses thereby offer insights that inform fundamental concepts like multiactivity and multimodality. The investigations in this book contribute to an increased understanding of the mechanisms of human interaction in general and in mobile settings more specifically.

Similar Papers
  • Book Chapter
  • Cite Count Icon 28
  • 10.1145/3015783.3015791
Using cognitive models to understand multimodal processes: the case for speech and gesture production
  • Apr 24, 2017
  • Stefan Kopp + 1 more

Multimodal behavior has been studied for a long time and in many fields, e.g., in psychology, linguistics, communication studies, education, and ergonomics. One of the main motivations has been to allow humans to use technical systems intuitively, in a way that resembles and fosters human users' natural way of interacting and thinking [Oviatt 2013]. This has sparked early work on multimodal human-computer interfaces, including recent approaches to recognize communicative behavior and even subtle multimodal cues by computer systems. Those approaches, for the most part, rest on machine learning techniques applied to large sets of behavioral data. As datasets grow larger in size and coverage, and computational power increases, suitable data-driven techniques are able to detect correlational behavior patterns that support answering questions like which feature( s) to take into account or how to recognize them in specific contexts. However, natural multimodal interaction in humans entails a plethora of behavioral variations and intricacies (e.g., when to act unimodally vs. multimodally, with which specific behaviors or multi-level coordination between them). Possible underlying patterns are hard to detect, even in large datasets, and often such variations are attributed to context-dependencies or individual differences. How they come about is still hard to explain at the behavioral level.

  • Research Article
  • Cite Count Icon 45
  • 10.2200/s00636ed1v01y201503hci030
The Paradigm Shift to Multimodality in Contemporary Computer Interfaces
  • Apr 13, 2015
  • Synthesis Lectures on Human-Centered Informatics
  • Sharon Oviatt + 1 more

During the last decade, cell phones with multimodal interfaces based on combined new media have become the dominant computer interface worldwide. Multimodal interfaces support mobility and expand the expressive power of human input to computers. They have shifted the fulcrum of human-computer interaction much closer to the human. This book explains the foundation of human-centered multimodal interaction and interface design, based on the cognitive and neurosciences, as well as the major benefits of multimodal interfaces for human cognition and performance. It describes the data-intensive methodologies used to envision, prototype, and evaluate new multimodal interfaces. From a system development viewpoint, this book outlines major approaches for multimodal signal processing, fusion, architectures, and techniques for robustly interpreting users' meaning. Multimodal interfaces have been commercialized extensively for field and mobile applications during the last decade. Research also is growing rapidly in areas like multimodal data analytics, affect recognition, accessible interfaces, embedded and robotic interfaces, machine learning and new hybrid processing approaches, and similar topics. The expansion of multimodal interfaces is part of the long-term evolution of more expressively powerful input to computers, a trend that will substantially improve support for human cognition and performance.

  • Conference Instance
  • 10.1145/2666242
Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal Interactions
  • Nov 16, 2014

It is our great pleasure to welcome you to the ICMI 2014 Workshop on Understanding and Modeling Multiparty, Multimodal Interactions -- UM3I 2014. The workshop will highlight recent developments and adopted methodologies in the analysis and modeling of multiparty and multimodal interactions towards the design and implementation principles of related humanmachine interfaces. UM3I 2014 aims to explore this growing field of multiparty multimodal interaction by bridging this multidisciplinary area and bringing together researchers from domains of multimodal signal processing, dialog systems, human-computer interaction, human-robot interaction, multimodal conversation analysis and multimodal user interfaces. The call for papers attracted submissions from Asia, Europe, and the United States. The program committee reviewed and accepted the following papers: UM3I 2014 Reviewed Accepted - Full Technical Papers 8 8 100% We also encourage participants to attend the keynote talk presentation, a valuable and insightful talk that can and will guide us to a better understanding of the domain of interest: From Modeling Multimodal and Multiparty Interactions to Designing Conversational Agents, Yukiko Nakano (Seikei University)

  • Supplementary Content
  • 10.21954/ou.ro.0000eefd
Impact on the knowledge construction process of multimedia online interactions in audio-graphic conferencing systems: the case of adult distance learners of French
  • Jan 1, 2014
  • Open Research Online (The Open University)
  • Chahrazed Mirza

Online researchers suggest that synchronous audio-graphic corm encing systems provide different mediational tools that create different mediated educa nal interactions that support the collaborative process of meaning construction, However, the existing literature does not indicate whether the quality of multimodal online interactions as well as the affordances of the use of the synchronous medium can effectively enh ce this process. This thesis brings together two lines of research. The thesis develops a methodological framework for the presentation and analysis of multimodal online interactions that draws on socio-constructivist understanding that the process of meaning construction is social and individual. The second is concerned with the analysis of online multimodal discussions; it examines the interrelationship between the different tools of communication and the different affordances of their simultaneous and single use that may hinder or promote the collaborative process of meaning construction. The design of this research focuses on interaction patterns and examines the extent which online discussions, mediated by the different tools of communication, reach high levels of collaborative meaning construction.This study assumes the knowledge construction process to be empirically observable through analysing online interactions and students' perceptions of the learning experiences. It examines, through interviews, questionnaires and video recordings of online tutorials, the quality of online learning experiences of two different UK Open University tutorial groups learning French. Results show that: participants make different multimodal choices which lead to the creation of different patterns of multi modal interactions and on line exchanges that affect differently participants' engagement in the collaborative meaning construction process; the single and the simultaneous use of the different tools of communication create different affordances for participants to perform different interactive and communicative roles; the multi modal competencies of students and tutors, the tutors' styles and task design play an important role in supporting the collaborative meaning construction process.

  • Conference Article
  • Cite Count Icon 13
  • 10.1145/1180495.1180521
Usability evaluation of the EPOCH multimodal user interface
  • Nov 1, 2006
  • Panagiotis Petridis + 3 more

This paper expands on the presentation of a methodology that provides a technology-enhanced exhibition of a cultural artefact through the use of a safe hybrid 2D/3D multimodal interface. Such tangible interactions are based on the integration of a 3DOF orientation tracker and information sensors with a 'Kromstaf' rapid prototype replica to provide tactile feedback. The multimodal interface allows the user to manipulate the object via physical gestures which, during evaluation, establish a profound level of virtual object presence and user satisfaction. If a user cannot manipulate the virtual object effectively many application specific tasks cannot be performed. This paper assesses the usability of the multimodal interface by comparing it with two input devices--the Magellan SpaceMouse, and a 'black box', which contains the same electronics as the multimodal interface but without the tactile feedback offered by the 'Kromstaf' replica. A complete human-centred usability evaluation was conducted utilizing task based measures in the form of memory recall investigations after exposure to the interface in conjunction with perceived presence and user satisfaction assessments. Fifty-four participants across three conditions (Kromstaf, space mouse and black box) took part in the evaluation.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tpami.2025.3565194
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving.
  • Aug 1, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Zeyu Yang + 5 more

Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and finally hampering the model performance. To address this limitation, in this work, we introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout, enabling their unique characteristics to be exploited during the whole perception pipeline. To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Specifically, the encoder is implemented as a dual-stream Transformer with specialized attention operation for information exchange and integration between separate modality-specific representations. Our multi-modal representational learning incorporates both object-centric, precise sampling-based feature alignment and global dense information spreading, essential for the more challenging planning task. The decoder is designed to iteratively refine the predictions by alternately aggregating information from separate representations in a unified modality-agnostic manner, realizing multi-modal predictive interaction. Extensive experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.

  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/978-3-319-08491-6_8
Multimodal Human-Computer Interfaces Based on Advanced Video and Audio Analysis
  • Jan 1, 2014
  • A Czyżewski + 4 more

Multimodal interfaces development history is reviewed briefly in the introduction. Some applications of multimodal interfaces to education software for disabled people are presented. One of them, the LipMouse is a novel, vision-based human-computer interface that tracks user’s lip movements and detect lips gestures. A new approach to diagnosing Parkinson’s disease is also shown. The progression of the disease can be measured employing the UPDRS (Unified Parkinson Disease Rating Scale) scale which is used to evaluate motor and behavioral symptoms of the Parkinson’s disease, based on the multimodal interface called Virtual-Touchpad (VTP) used for supporting medical diagnosis. The scent emitting multimodal computer interface provides an important supplement of the polysensoric stimulation process, playing an essential role in education and therapy of children with certain developmental disorders. The Smart Pen providing a tool for supporting therapy of developmental dyslexia is presented and results achieved with its application are discussed. The eye-gaze tracking system named Cyber Eye, developed at the Multimedia Systems Department employed to many kinds of experiments is presented including analysis of visual activity of patients remaining in vegetative state and their awareness evaluation. The paper is concluded with some general remarks concerning the role of multimodal computer interfaces applied to learning, therapy and everyday usage of computerized devices.KeywordsReading ComprehensionGesture RecognitionDevelopmental DyslexiaMultimodal InterfaceUnify Parkinson Disease Rate ScaleThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Research Article
  • Cite Count Icon 60
  • 10.1016/j.procir.2018.03.224
Deep Learning-based Multimodal Control Interface for Human-Robot Collaboration
  • Jan 1, 2018
  • Procedia CIRP
  • Hongyi Liu + 4 more

Deep Learning-based Multimodal Control Interface for Human-Robot Collaboration

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-540-78331-2_16
Design and evaluation of a multimodal human-multirobot interface
  • Jan 1, 2008
  • Boris Trouvain + 1 more

Supervisory control of mobile multi-robot systems often forces the operator to concurrently process information of multiple visual displays under different task conditions. The approach of this paper is to design and evaluate a multimodal human-robot interface to support the processing of dual tasks. The main innovation of the multimodal interface is the binaural auditory information renderer which allows shifting of state information from the visual to the auditory channel of the operator. The multimodal interface was evaluated in a laboratory study with 20 participants. Dependent variables were human performance and workload. Independent variables were the interface modality (monomodal vs. multimodal) and the number of robots to be supervised (1 or 2 robots). The results show that the binaural information fusion significantly improves human performance and also significantly lowers the subjective workload (α = 0,05).

  • Book Chapter
  • Cite Count Icon 18
  • 10.1145/3015783.3015786
Theoretical foundations of multimodal interfaces and systems
  • Apr 24, 2017
  • Sharon Oviatt

This chapter discusses the theoretical foundations of multisensory perception and multimodal communication. It provides a basis for understanding the performance advantages of multimodal interfaces, as well as how to design them to reap these advantages. Historically, the major theories that have influenced contemporary views of multimodal interaction and interface design include Gestalt theory, Working Memory theory, and Activity theory. They include perception-action dynamic theories and also limited resource theories that focus on constraints involving attention and short-term memory. This chapter emphasizes these theories in part because they are supported heavily by neuroscience findings. Their predictions also have been corroborated by studies on multimodal human-computer interaction. In addition to summarizing these three main theories and their impact, several related theoretical frameworks will be described that have influenced multimodal interface design, including Multiple Resource theory, Cognitive Load theory, Embodied Cognition, Communication Accommodation theory, and Affordance theory.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/hsi.2013.6577797
Multimodal human-computer interfaces based on advanced video and audio analysis
  • Jun 1, 2013
  • Andrzej Czyzewski + 6 more

Multimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart Pen providing a tool for supporting therapy of developmental dyslexia is presented and results achieved with its application are discussed. The eye-gaze tracking system named "Cyber-Eye" developed at the Multimedia Systems Department employed to many kinds of experiments is presented including analysis of visual activity of patients remaining in vegetative state and their awareness evaluation. The scent emitting multimodal computer interface provides an important supplement of the polysensoric stimulation process, playing an essential role in education and therapy of children with certain developmental disorders. A new approach to diagnosing Parkinson's disease is shown. The progression of the disease can be measured employing the UPDRS (Unified Parkinson Disease Rating Scale) scale which is used to evaluate motor and behavioral symptoms of Parkinson's disease, based on the multimodal interface called Virtual-Touchpad (VTP) used for supporting medical diagnosis. The paper is concluded with some general remarks concerning the role of multimodal computer interfaces applied to learning, therapy and everyday usage of computerized devices.

  • Conference Instance
  • Cite Count Icon 59
  • 10.1145/1647314
Proceedings of the 2009 international conference on Multimodal interfaces
  • Nov 2, 2009

It is our great pleasure to welcome you to Cambridge and the joint meeting of the International Conference on Multimodal Interfaces and the workshop on Machine Learning for Multimodal Interaction! This year ICMI and MLMI decided to join forces. The advisory boards of both meetings supported this decision as a way to consolidate the community and expand the range of topics of both meetings. We hope the decision will further improve the quality of this joint meeting and also unify the locus for the novel ideas in the area of Multimodal Interfaces and Interactions. As a result of this effort, this year has seen an increase in submissions. We have nearly 120 papers, 20 demos and 5 workshop and 4 Special session proposals submitted to the conference committee. Out of the 118 papers submitted, 41 were selected for oral and poster presentation, bringing the conference acceptance rate to 35%. Half of the demonstration proposals were accepted, bringing the number of academic demonstrations to ten. We are hosting four post-conference workshops centered on novel topics of multi-modality. Finally, one of the four proposed special sessions was selected for inclusion into the program, where it appears as a collection of six additional invited papers. The review process was organized using the PCS submission and review system, which ICMI has used in the past. We are grateful to James Stewart for his timely and professional support. To streamline the review process, this year we have selected a smaller number of Area Chairs (ACs) who appointed the Program Committee. The papers were allocated to ACs in areas of their expertise according to the indications of the submitters, and then checked for conflicts. ACs distributed the papers to members of program committee and volunteer reviewers for comments. Once reviews were submitted the ACs provided meta-reviews for all papers. The scores of the papers were then collected and tabulated. All reviews and papers were then again checked by the Program Chairs, and papers with highly varying scores received an additional round of reviews. Based on this thorough review process 41 papers were selected for presentation. The program was formed by grouping papers into main topics of interest for this year's conference. Following the trend in the academic meetings to reduce amount of waste we decided to distribute the conference proceedings on USB Flash Drives. We decided that flash drives provide the best tradeoff between cost and flexibility for participants since they can be freely re-used once their content is thoroughly memorized. This year we have selected 6 papers as candidates for two awards: Outstanding Student Paper, sponsored by MERL, and Outstanding Paper, sponsored by Google. An anonymous committee has been selected by Program and General Chairs reviewing 10% of the top scoring papers. You will find the nominated papers in the conference program marked with special symbol. The final award decisions will be made at the conference banquet on Monday evening. The financial crisis has taken its toll on everyone. The US National Science Foundation (NSF) has very generously provided us with travel and housing support for twelve students to help offset pressure on academic travel budgets. Two European academic projects have also contributed significant amount of funds to the conference organization: Augmented Multi-Party Interaction (AMI), and the Swiss National Center of Competence in Research on Interactive Multimodal Information Management (IM2). Finally, we thank the European Network of Excellence on Pattern Analysis, Statistical Modeling, and Computational Learning (PASCAL 2) for the funding to support the travel of two of our keynote speakers. Even in these difficult times, many companies affirmed their support of the multimodal interaction and interface research community by providing ICMI-MLMI with a previously unseen level of financial support. All of these organizations deserve our warmest gratitude: Mitsubishi Electric Research Labs, Google, Microsoft Research, Honda Research Institute-US, The Mathworks, and Telefonica! Without their generous support this meeting would not have been possible.

  • Conference Article
  • 10.1109/cw.2009.31
The Use of Multimodal Metaphors on E-learning Note-Taking
  • Jan 1, 2009
  • Mohamed Sallam + 1 more

This paper introduces an empirical study to investigate the use of multimodal metaphors to communicate information in the interface of e-learning applications. The aim of the experiment was to measure and compare the level of usability of textual and multimodal interfaces. The usability parameters which are efficiency, effectiveness, and userspsilasatisfaction were considered in the study. In order to carry out comparative investigation, two independent groups were involved to evaluate two different interfaces using an experimental e-learning platform. First group (control) consisted of 22 participants using a text only interface platform. This platform used Microsoft Word 2007 and its dasiaadding commentspsilafeature as the modal. The second platform was based on a multimodal interface used by the experimental group and consisted of three multimodal tools to improve efficiency of e-learning. The modalities used by this experimental group were text, speech and graphics. The results obtained from this investigation have shown that the multimodal e-learning interface group took less time to complete the experimental tasks and successfully performed a higher number of tasks, and was more satisfied than the textual interface group. Therefore the Multimodal interface provided a more usable interface for the users in terms of efficiency, effectiveness and user satisfaction.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/1719970.1720027
A multimodal labeling interface for wearable computing
  • Feb 7, 2010
  • Shanqing Li + 1 more

Under wearable environments, it is not convenient to label an object with portable keyboards and mice. This paper presents a multimodal labeling interface to solve this problem with natural and efficient operations. Visual and audio modalities cooperate with each other: an object is encircled by visual tracking of a pointing gesture, and meanwhile its name is obtained by speech recognition. In this paper, we propose a concept of virtual touchpad based on stereo vision techniques. With the touchpad, the object encircling task is achieved by drawing a closed curve on a transparent blackboard. The touch events and movements of a pointing gesture are robustly detected for natural gesture interactions. The experimental results demonstrate the efficiency and usability of our multimodal interface.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/icsmc.2001.972009
The design of multimodal human-machine interface for teleoperation
  • Oct 7, 2001
  • Wusheng Chou + 1 more

Teleoperation is a viable alternative to project a human operator's intelligence into the places that are inaccessible or dangerous to people, or where expertise and resources are not available. Due to the distance between the human operator and remote environment, the human-machine interface is an important component for the overall system performance capabilities and efficiency. The paper proposes a new design method of multimodal interface for teleoperation. A distributed graphic predictive display subsystem based on virtual reality is implemented, and all kinds of feedback information acquired from a remote environment, such as actual live images, audio and force information are organized and presented to human operators in an appropriate, way. Experimental results demonstrate that the multimodal human-machine interface can reduce a human operator's mental workload and facilitate teleoperation. Some key technologies concerned with this multimodal interface, such as the synchronization mechanism of the distributed predictive simulation subsystem and the real time transmission of actual live multimedia via the Internet under narrow bandwidth are also developed.

Save Icon
Up Arrow
Open/Close