Intelligent Sound Source Localization and its application to multimodal human tracking

K Nakamura,F Asano,K Nakadai,G Ince

doi:10.1109/iros.2011.6094558

Abstract

We have assessed robust tracking of humans based on intelligent Sound Source Localization (SSL) for a robot in a real environment. SSL is fundamental for robot audition, but has three issues in a real environment: robustness against noise with high power, lack of a general framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. To address the first issue, we extended Multiple SIgnal Classification by incorporating Generalized EigenValue Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. To address the second issue, we proposed Sound Source Identification (SSI) based on hierarchical gaussian mixture models and integrated it with GEVD-MUSIC to realize a selective listening function. To address the third issue, we integrated audio-visual human tracking using particle filtering. Integration of these three techniques into an intelligent human tracking system showed: 1) GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5-6 dB; 2) SSI performed more than 70% in F-measure even in a noisy environment; and 3) audio-visual integration improved the average tracking error by approximately 50%.

Full Text