Abstract

Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling.

Highlights

  • Source localization is an important problem in acoustics and many related fields

  • For variational autoencoders (VAEs)-supervised learning (SSL) we find that the system generalizes well to the DTU environment using only few labels, since it extracts physically meaningful features as we will show in Sec

  • We have proposed a semi-supervised approach to acoustic source localization in reverberant environments based on deep generative modeling with VAEs, which we deem VAESSL

Read more

Summary

Introduction

Source localization is an important problem in acoustics and many related fields. The performance of localization algorithms is degraded by reverberation, which induces complex temporal arrival structure at sensor arrays. [1]ā€“[3], acoustic localization in reverberant environments remains a major challenge [4]. There has been great interest in machine learning (ML)-based techniques in acoustics, including source localization and event detection [5]ā€“[14]. One difficulty for ML-based methods in acoustics is the limited amount of labeled data and the complex acoustic propagation in natural environments, despite large volumes of recordings [1], [2]. This limitation has motivated recent approaches for localization based on semi-supervised learning (SSL) [15], [16]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call