Semi-supervised source localization in reverberant environments using deep generative modeling

Michael J Bianco,Sharon Gannot,Peter Gerstoft,Efren Fernandez-Grande

doi:10.1121/1.5147419

Abstract

We present a method for acoustic source localization in reverberant environments based on semi-supervised machine learning (ML) with deep generative models. Source localization in the presence of reverberation remains a major challenge, which recent ML techniques have shown promise in addressing. Despite often large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. In semi-supervised learning, ML systems are trained using many examples with only few labels, with the goal of exploiting the natural structure of the data. We use variational autoencoders (VAEs), which are generative neural networks (NNs) that rely on explicit probabilistic representations, to model the latent distribution of reverberant acoustic data. VAEs consist of an encoder NN, which maps complex input distributions to simpler parametric distributions (e.g., Gaussian), and a decoder NN which approximates the training examples. The VAE is trained to generate the phase of relative transfer functions (RTFs) between two microphones in reverberant environments, in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The performance this VAE-based approach is compared with conventional and ML-based localization in simulated and real-world scenarios.

Full Text