Abstract

Sound source localization and separation are essential functions for robot audition to comprehend acoustic environments. The widely-used multiple signal classification (MUSIC) can precisely estimate the directions of arrival (DoAs) of multiple sound sources if its hyperparameters are selected appropriately depending on the surrounding environment. A popular separation method based on a complex Gaussian mixture model (CGMM), on the other hand, can extract multiple sources even in noisy environments if its latent variables are properly initialized to avoid bad local optima. To overcome the drawbacks of both the MUSIC and CGMM, we propose a robot audition framework that complementarily combines the MUSIC and CGMM in a probabilistic manner. Our method is based on a variant of the CGMM conditioned by the localization results of MUSIC. The hyperparameters of MUSIC are estimated by the type II maximum likelihood estimation of the CGMM, and the CGMM itself is efficiently initialized and regularized by using the localization results of MUSIC. Experimental results show that our method outperformed conventional localization and separation methods even when the number of sound sources is unknown. we also demonstrate that our method can work even with moving sound sources in real time.

Highlights

  • R OBOT audition, which computationally comprehends acoustic environments [1]–[4], is an essential function for robots working in our everyday lives

  • The performance of standard eigenvalue decomposition (SEVD)-multiple signal classification (MUSIC), significantly deteriorated when the parameter was not appropriately selected, and the parameter tuning is essential for MUSIC

  • Our MUSIC-complex Gaussian mixture model (CGMM) successfully avoided this problem by utilizing the MUSIC localization

Read more

Summary

Introduction

R OBOT audition, which computationally comprehends acoustic environments [1]–[4], is an essential function for robots working in our everyday lives. A rescue robot searching for victims by detecting faint voices or other sounds needs to understand acoustic scenes. Such a robot has to be equipped with a computational audition system enabling it to comprehend when, where, and which kind of a sound event happens. Adaptive beamformers and blind source separation (BSS) methods constrained by the localization results have been widely used [12], [13] These systems can work in real time on a low-resource computer (e.g., a laptop computer) by combining the individual modules. This combination has enabled various applications such as humanoid robots [14]–[16], search-and-rescue drones [17], and tele-existence robots [18]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.