Abstract

The performance of the traditional direction-of-arrival (DOA) estimation algorithms greatly degrades in noisy and reverberant environments. Recently, deep learning has been applied to sound source localization and provided the substantial improvement in robustness for DOA estimation. In this paper, we propose a sound source localization approach using the deep learning-based steering vector phase difference enhancement. The steering vectors and their estimation reliability functions (ERFs) are first estimated under the guidance of the time-frequency masks that are predicted using deep neural network (DNN). The phase difference of the steering vectors is further enhanced with a second DNN model, which is trained with the ERF-weighted mean square error (MSE) loss. The DOA of the sound source is finally determined by the ERF-weighted histogram analysis. Experimental results with various types and levels of noise and various reverberant conditions show that the proposed approach outperforms the state-of-the-art sound source localization algorithms in utterance and frame-level DOA estimation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.