Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Jörn Anemüller

doi:10.7557/18.5151

Abstract

Multi-channel acoustic source localization evaluates direction-dependentinter-microphone differences in order to estimate the position of an acousticsource embedded in an interfering sound field. We here investigate a deep neuralnetwork (DNN) approach to source localization that improves on previous workwith learned, linear support-vector-machine localizers. DNNs with depthsbetween 4 and 15 layers were trained to predict azimuth direction of targetspeech in 72 directional bins of width 5 degree, embedded in an isotropic,multi-speech-source noise field. Several system parameters were varied, inparticular number of microphones in the bilateral hearing aid scenario wasset to 2, 4, and 6, respectively. Results show that DNNs provide a clear improvement inlocalization performance over a linear classifier reference system.Increasing the number of microphones from 2 to 4 results in a larger increase ofperformance for the DNNs than for the linear system. However, 6 microphonesprovide only a small additional gain. The DNN architectures perform betterwith 4 microphones than the linear approach does with 6 microphones, thusindicating that location-specific information in source-interference scenariosis encoded non-linearly in the sound field.

Highlights

The human auditory systems routinely performs acoustic source localization, a task is important in technical systems since it permits detection of relevant event such as speech, facilitates reconfiguration of spatial signal processing, and may trigger subsequent actions such as obstacle avoidance in robots.Location-specific information as measured with multi-channel microphone arrays is encoded in relative transfer functions (RTFs, [8]), dominated by, but not limited to, time-differences of arrival (TDOA) of the direct-path component of the acoustic signal
Classic approaches for source localization are based on TDOA analysis which commonly uses the generalized cross-correlation (GCC) method to yield robust TDOA estimates [2, 9]
The results show that deep neural network (DNN)=processing obtains a larger benefit from an additional microphones compared to the linear network Net R

Summary

Introduction

The human auditory systems routinely performs acoustic source localization, a task is important in technical systems since it permits detection of relevant event such as speech, facilitates reconfiguration of (auditory) spatial signal processing, and may trigger subsequent actions such as obstacle avoidance in robots. Location-specific information as measured with multi-channel microphone arrays is encoded in relative transfer functions (RTFs, [8]), dominated by, but not limited to, time-differences of arrival (TDOA) of the direct-path component of the acoustic signal. The present work evaluates a non-linear extension of an earlier linear approach [5] by employing deep feed-forward networks that learn the transformation from multi-channel audio signals to a probabilistic location map.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Northern Lights Deep Learning Workshop

Lead the way for us

Journal: Proceedings of the Northern Lights Deep Learning Workshop	Publication Date: Feb 6, 2020
License type: CC BY 4.0

Similar Papers

On random matrices arising in deep neural networks: General I.I.D. case
Leonid Pastur ... Victor Slavin
Random Matrices: Theory and Applications | VOL. 12
Leonid Pastur, et. al.Leonid Pastur ... Victor Slavin
14 Jul 2022
Random Matrices: Theory and Applications | VOL. 12

State-of-Charge Estimation of Li-Ion Battery in Electric Vehicles: A Deep Neural Network Approach
Dickshon N T How ... Kashem M Muttaqi
IEEE Transactions on Industry Applications | VOL. 56
Dickshon N T How, et. al.Dickshon N T How ... Kashem M Muttaqi
22 Jun 2020
IEEE Transactions on Industry Applications | VOL. 56

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Xin Liu ... Zhisong Pan
Information Sciences | VOL. 612
Xin Liu, et. al.Xin Liu ... Zhisong Pan
05 Sep 2022
Information Sciences | VOL. 612

Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature
Yahui Shan ... Jing Wang
-
Yahui Shan, et. al.Yahui Shan ... Jing Wang
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Northern Lights Deep Learning Workshop