Abstract

Although sound source localization is a desirable technique in many communication systems and intelligence applications, the distortion caused by diffuse noise or reverberation makes the time delay estimation (TDE) between signals acquired by a pair of microphones a complicated and challenging problem. In this paper, we describe a method that can efficiently achieve sound source localization in noisy and reverberant environments. This method is based on the generalized cross-correlation (GCC) function with phase transform (PHAT) weights (GCC-PHAT) to achieve robustness against reverberation. In addition, to estimate the time delay robust to diffuse components and to further improve the robustness of the GCC-PHAT against reverberation, time-frequency(t-f) components of observations directly emitted by a point source are chosen by “inversed” diffuseness. The diffuseness that can be estimated from the coherent-to-diffuse power ratio (CDR) based on spatial coherence between two microphones represents the contribution of diffuse components on a scale of zero to one with direct sounds from a source modeled to be fully coherent. In particular, the “inversed” diffuseness is binarized with a very rigorous threshold to select highly reliable components for accurate TDE even in noisy and reverberant environments. Experimental results for both simulated and real-recorded data consistently demonstrated the robustness of the presented method against diffuse noise and reverberation.

Highlights

  • Sound source localization is a desirable technique in various communication systems and intelligence applications, including speech enhancement in noisy and reverberant environments by forming a beam toward the target source [1], [2]

  • The generalized cross-correlation (GCC)-phase transform (PHAT) may provide the time delay estimation (TDE) robust against reverberation, but it is known to be sensitive to ambient noise as the normalization emphasizes frequency components with small powers

  • EXPERIMENTAL RESULTS In order to evaluate the performance of the proposed sound source localization method in noisy and reverberant environments, we simulated signals observed at two 12-cm-apart microphones3 from a source in a 5 m × 4 m × 3 m rectangular room

Read more

Summary

INTRODUCTION

Sound source localization is a desirable technique in various communication systems and intelligence applications, including speech enhancement in noisy and reverberant environments by forming a beam toward the target source [1], [2] It can be achieved by exploiting the difference among the signals obtained by spatially separated microphones. In order to achieve further robustness, masks have been applied to remove time-frequency(t-f) components of observed signals that were harmful for source localization by containing noise or reverberation significantly (e.g., [14], [23]–[27]). If two microphone signals contain only direct sounds emitted from a source, they are delayed and scaled versions of each other, and fully coherent, whereas other components caused by diffuse noise or reverberation may be assumed to be diffuse. The contribution of diffuse components in the microphone signals on a scale of zero to one can be obtained by the diffuseness estimator defined as [45]

PROPOSED SOUND SOURCE LOCALIZATION METHOD
EXPERIMENTAL RESULTS
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.