Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking

Zhong-Qiu Wang,Deliang Wang,Xueliang Zhang

doi:10.1109/taslp.2018.2876169

Abstract

Deep learning-based time-frequency T-F masking has dramatically advanced monaural single-channel speech separation and enhancement. This study investigates its potential for direction of arrival DOA estimation in noisy and reverberant environments. We explore ways of combining T-F masking and conventional localization algorithms, such as generalized cross correlation with phase transform, as well as newly proposed algorithms based on steered-response SNR and steering vectors. The key idea is to utilize deep neural networks DNNs to identify speech dominant T-F units containing relatively clean phase for DOA estimation. Our DNN is trained using only monaural spectral information, and this makes the trained model directly applicable to arrays with various numbers of microphones arranged in diverse geometries. Although only monaural information is used for training, experimental results show strong robustness of the proposed approach in new environments with intense noise and room reverberation, outperforming traditional DOA estimation methods by large margins. Our study also suggests that the ideal ratio mask and its variants remain effective training targets for robust speaker localization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2019
Citations: 91	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Robust online direction of arrival estimation using low dimensional spherical harmonic features
Vishnuvardhan Varanasi ... Rajesh Hegde
-
Vishnuvardhan Varanasi, et. al.Vishnuvardhan Varanasi ... Rajesh Hegde
01 Mar 2017
01 Mar 2017

DOA estimation of multiple speech sources by selecting reliable local sound intensity estimates
Shaowei Ding ... Huawei Chen
Applied Acoustics | VOL. 127
Shaowei Ding, et. al.Shaowei Ding ... Huawei Chen
10 Jul 2017
Applied Acoustics | VOL. 127

Adaptive Multichannel Time Delay Estimation Based on Blind System Identification for Acoustic Source Localization
Yiteng Huang ... Jacob Benesty
-
Yiteng Huang, et. al.Yiteng Huang ... Jacob Benesty
01 Jan 2003
01 Jan 2003

GCC-based DoA estimation of overlapping muzzleblast and shockwave components of gunshot signals
Izabela L Freire ... Jose A Apolinario
-
Izabela L Freire, et. al.Izabela L Freire ... Jose A Apolinario
01 Feb 2011
01 Feb 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing