Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference

Longbiao Cheng,Yonghong Yan,Dingding Yao,Xingwei Sun,Junfeng Li

doi:10.1109/taslp.2020.3043107

Abstract

The performance of the traditional direction-of-arrival (DOA) estimation algorithms greatly degrades in noisy and reverberant environments. Recently, deep learning has been applied to sound source localization and provided the substantial improvement in robustness for DOA estimation. In this paper, we propose a sound source localization approach using the deep learning-based steering vector phase difference enhancement. The steering vectors and their estimation reliability functions (ERFs) are first estimated under the guidance of the time-frequency masks that are predicted using deep neural network (DNN). The phase difference of the steering vectors is further enhanced with a second DNN model, which is trained with the ERF-weighted mean square error (MSE) loss. The DOA of the sound source is finally determined by the ERF-weighted histogram analysis. Experimental results with various types and levels of noise and various reverberant conditions show that the proposed approach outperforms the state-of-the-art sound source localization algorithms in utterance and frame-level DOA estimation.

Full Text