Abstract

We present a multi-microphone multi-speaker direction of arrival (DOA) tracking algorithm. In the proposed algorithm, the DOA values are discretized to a set of candidate DOAs. Accordingly, and following the W-disjoint orthogonality (WDO) property of the speech signal, each time-frequency (TF) bin in the short-time Fourier transform (STFT) domain is associated with a single DOA candidate. The conditional probability of each TF observation given its corresponding DOA association, is modeled as a multivariate complex-Gaussian distribution, with the power spectral density (PSD) of each source an unknown parameter. By applying the Fisher-Neyman factorization, it can be shown that this conditional probability is proportional to the signal-to-noise ratio (SNR) at the outputs of minimum variance distortionless response (MVDR)-beamformers (BFs), directed towards all candidate DOAs. We model these observations as either a frequency-wise parallel Hidden Markov Model (HMM) or as a coupled HMM with coupling between adjacent frequency bins. The posterior probability of these associations is inferred by applying an extended FB (FB) algorithm, and the actual DOAs can be inferred from this posterior. An experimental study demonstrates the benefits of the proposed algorithm using both a simulated dataset and real recordings drawn from the acoustic source localization and tracking (LOCATA) dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.