Abstract

This paper describes a system that is able to localize and detect sound sources in the presence of interfering noise and reverberation from the stereo sound recording. First, a computational auditory scene analysis (CASA) based binaural front-end is applied to generate the binaural cues, interaural time and level differences (ITDs & ILDs). Second, based on the probabilistic nature of the binaural cues, a combination of ITDs and ILDs as the binaural feature space is modeled by Gaussian mixture models (GMMs) to compute the probability density functions (PDFs) of time-frequency units. Speech source localization was determined by a Bayesian maximum a posterior (MAP). Third, binary mask is estimated after Bayesian analysis to detect the speech. For evaluating the performance of this proposed system, both simulated acoustic condition and real rooms are applied in the evaluation stage. The results show that our proposed method achieves a good speech localization performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call