Abstract

Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation methods. In a large subcategory of these methods, the generalised cross-correlation (GCC) is employed for TDOA estimation. In this study, the authors propose a subband processing-based method that computes the GCC of the microphone pairs in each subband. The information collected from different subbands is then combined together to estimate the direction of two simultaneous speakers. While the conventional methods consider the whole signal spectrum in the localisation procedure, the proposed method takes advantage of the difference in the frequency contents of the speakers. The proposed method computes the histograms of the peak positions of the GCC curve for each microphone pair in different subbands. These histograms are then fused using one of the three proposed histogram averaging methods, called simple, sectional, and weighted averaging. The proposed method has been evaluated on simulated and real speech data in noisy, reverberant, and noisy–reverberant conditions. The evaluation results demonstrate the superiority of the proposed subband processing-based method over its full-band counterpart. The authors’ experiments also show that among different histogram averaging methods, the weighted averaging has greater performance in estimating the direction of speakers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call