Multimodal integration learning of robot behavior using deep neural networks

Kuniaki Noda,Hiroaki Arie,Yuki Suga,Tetsuya Ogata

doi:10.1016/j.robot.2014.03.003

Kuniaki Noda, Hiroaki Arie + Show 2 more

Open Access

https://doi.org/10.1016/j.robot.2014.03.003

Copy DOI

Journal: Robotics and Autonomous Systems	Publication Date: Mar 24, 2014
Citations: 164	License type: cc-by-nc-nd

Affiliation: Waseda University

Abstract

For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.

Full Text