Robot audition, the ability of a robot to listen to several things at once with its own “ears,” is crucial to the improvement of interactions and symbiosis between humans and robots. Since robot audition was originally proposed and has been pioneered by Japanese research groups, this special issue on robot audition technologies of the Journal of Robotics and Mechatronics covers a wide collection of advanced topics studied mainly in Japan. Specifically, two consecutive JSPS Grants-in-Aid for Scientific Research (S) on robot audition (PI: Hiroshi G. Okuno) from 2007 to 2017, JST Japan-France Research Cooperative Program on binaural listening for humanoids (PI: Hiroshi G. Okuno and Patrick Danès) from 2009 to 2013, and the ImPACT Tough Robotics Challenge (PM: Prof. Satoshi Tadokoro) on extreme audition for search and rescue robots since 2015 have contributed to the promotion of robot audition research, and most of the papers in this issue are the outcome of these projects. Robot audition was surveyed in the special issue on robot audition in the Journal of Robotic Society of Japan, Vol.28, No.1 (2011) and in our IEEE ICASSP-2015 paper. This issue covers the most recent topics in robot audition, except for human-robot interactions, which was covered by many papers appearing in Advanced Robotics as well as other journals and international conferences, including IEEE IROS. This issue consists of twenty-three papers accepted through peer reviews. They are classified into four categories: signal processing, music and pet robots, search and rescue robots, and monitoring animal acoustics in natural habitats. In signal processing for robot audition, Nakadai, Okuno, et al. report on HARK open source software for robot audition, Takeda, et al. develop noise-robust MUSIC-sound source localization (SSL), and Yalta, et al. use deep learning for SSL. Odo, et al. develop active SSL by moving artificial pinnae, and Youssef, et al. propose binaural SSL for an immobile or mobile talker. Suzuki, Otsuka, et al. evaluate the influence of six impulse-response-measuring signals on MUSIC-based SSL, Sekiguchi, et al. give an optimal allocation of distributed microphone arrays for sound source separation, and Tanabe, et al. develop 3D SSL by using a microphone array and LiDAR. Nakadai and Koiwa present audio-visual automatic speech recognition, and Nakadai, Tezuka, et al. suppress ego-noise, that is, noise generated by the robot itself. In music and pet robots, Ohkita, et al. propose audio-visual beat tracking for a robot to dance with a human dancer, and Tomo, et al. develop a robot that operates a wayang puppet, an Indonesian world cultural heritage, by recognizing emotion in Gamelan music. Suzuki, Takahashi, et al. develop a pet robot that approaches a sound source. In search and rescue robots, Hoshiba, et al. implement real-time SSL with a microphone array installed on a multicopter UAV, and Ishiki, et al. design a microphone array for multicopters. Ohata, et al. detect a sound source with a multicopter microphone array, and Sugiyama, et al. identify detected acoustic events through a combination of signal processing and deep learning. Bando, et al. enhance the human-voice online and offline for a hose-shaped rescue robot with a microphone array. In monitoring animal acoustics in natural habitats, Suzuki, Matsubayashi, et al. design and implement HARKBird, Matsubayashi, et al. report on the experience of monitoring birds with HARKBird, and Kojima, et al. use a spatial-cue-based probabilistic model to analyze the songs of birds singing in their natural habitat. Aihara, et al. analyze a chorus of frogs with dozens of sound-to-light conversion device Firefly, the design and analysis of which is reported on by Mizumoto, et al. The editors and authors hope that this special issue will promote the further evolution of robot audition technologies in a diversity of applications.
Read full abstract