Abstract
Speaker localization with microphone arrays has received significant attention in the past decade as a means for automated speaker tracking of individuals in a closed space for videoconferencing systems, directed speech capture systems, and surveillance systems. Traditional techniques are based on estimating the relative time difference of arrivals (TDOA) between different channels, by utilizing crosscorrelation function. As we show in the context of speaker localization, these estimates yield poor results, due to the joint effect of reverberation and the directivity of sound sources. In this paper, we present a novel method that utilizes a priori acoustic information of the monitored region, which makes it possible to localize directional sound sources by taking the effect of reverberation into account. The proposed method shows significant improvement of performance compared with traditional methods in "noise-free" condition. Further work is required to extend its capabilities to noisy environments.
Highlights
The inverse problem of localizing a source by using signal measurements at an array of sensors is a classical problem in signal processing, with applications in sonar, radar, and acoustic engineering
A novel time difference of arrivals (TDOA)-based sound source localization algorithm was presented which integrates a priori information of the acoustic environment for the localization of directional sound sources in reverberant environments
The algorithm utilizes the redundant information provided by multiple sensors to enhance the TDOA performance
Summary
The inverse problem of localizing a source by using signal measurements at an array of sensors is a classical problem in signal processing, with applications in sonar, radar, and acoustic engineering. Many new ideas have been proposed to deal more effectively with noise and reverberation by taking advantage of the nature of a speech signal [14, 15] or by utilizing redundant information from multiple sensor pairs [11, 16,17,18] Another interesting approach is to utilize the impulse response functions from the source to the microphones. The first one is the high-resolution spectral estimation technique [2, 3] where the transfer functions are estimated blindly by an adaptive algorithm intended to find the eigenvalues of the cross-correlation matrix The more accurate this estimate is, the better the relative delay between the two microphone signals can be estimated. We consider the effect of source directivity on source localization performance; our system can more accurately localize nonisotropic sound sources (e.g., human sources) as well, without being limited by their orientation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have