DOA Estimation for Multiple Speech Sources Based on Flexible Single-Source Zones and Concentration Weighting

Jiale Lin,Zhao Zhao,Zhiyong Xu,Hongrui Kan

doi:10.1109/jsen.2023.3263861

Abstract

Direction of arrival (DOA) estimation is the key to many audio applications. Recently, sparse component analysis (SCA)-based methods have attracted much attention, in which single-source points (SSPs) and single-source zones (SSZs) where one source is dominant over the others in time-frequency domain are usually detected to construct the pooled histogram containing multi-source DOA information. Nonetheless, the SSZ size in existing methods is fixed and empirically predetermined, which cannot accommodate to the varying spectro-temporal property of speech sources. Furthermore, higher SSP concentration in a SSZ implies a locally stronger dominant source as well as more reliable DOA information extracted therein, which however is also not taken into account yet. To address these problems, a DOA estimation algorithm for multiple speech sources based on flexible SSZs and concentration weighting is presented in this paper. First, in each frame, correlation coefficients of time delay vectors across adjacent frequency bins are calculated to identify SSPs, followed by flexible SSZs construction using varying number of SSPs located at consecutive frequency bins. Next, the number of SSPs in each flexible SSZ is considered as a proxy of corresponding concentration degree, and employed as weighting factor to form the pooled histogram. Finally, a matching pursuit (MP)-based approach is utilized to obtain multi-source DOA estimates. Simulation results reveal that the proposed method significantly outperforms existing approaches in terms of noise floor in pooled histogram, angular resolution, and performance under various signal-to-noise ratio and reverberant conditions. Real-world experiments also verify its effectiveness, and meanwhile demonstrate considerably reduced computational complexity.

Full Text