Subband processing‐based approach for the localisation of two simultaneous speakers

Ali Dehghan Firoozabadi,Hamid Reza Abutalebi

doi:10.1049/iet-spr.2013.0475

Abstract

Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation methods. In a large subcategory of these methods, the generalised cross-correlation (GCC) is employed for TDOA estimation. In this study, the authors propose a subband processing-based method that computes the GCC of the microphone pairs in each subband. The information collected from different subbands is then combined together to estimate the direction of two simultaneous speakers. While the conventional methods consider the whole signal spectrum in the localisation procedure, the proposed method takes advantage of the difference in the frequency contents of the speakers. The proposed method computes the histograms of the peak positions of the GCC curve for each microphone pair in different subbands. These histograms are then fused using one of the three proposed histogram averaging methods, called simple, sectional, and weighted averaging. The proposed method has been evaluated on simulated and real speech data in noisy, reverberant, and noisy–reverberant conditions. The evaluation results demonstrate the superiority of the proposed subband processing-based method over its full-band counterpart. The authors’ experiments also show that among different histogram averaging methods, the weighted averaging has greater performance in estimating the direction of speakers.

Full Text