Abstract
Within the context of assistive robotics we develop an intelligent interface that provides multimodal sensory processing capabilities for human action recognition. Human action is considered in multimodal terms, containing inputs such as audio from microphone arrays, and visual inputs from high definition and depth cameras. Exploring state-of-the-art approaches from automatic speech recognition, and visual action recognition, we multimodally recognize actions and commands. By fusing the unimodal information streams, we obtain the optimum multimodal hypothesis which is to be further exploited by the active mobility assistance robot in the framework of the MOBOT EU research project. Evidence from recognition experiments shows that by integrating multiple sensors and modalities, we increase multimodal recognition performance in the newly acquired challenging dataset involving elderly people while interacting with the assistive robot.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have