Abstract
Multi-channel acoustic source localization evaluates direction-dependentinter-microphone differences in order to estimate the position of an acousticsource embedded in an interfering sound field. We here investigate a deep neuralnetwork (DNN) approach to source localization that improves on previous workwith learned, linear support-vector-machine localizers. DNNs with depthsbetween 4 and 15 layers were trained to predict azimuth direction of targetspeech in 72 directional bins of width 5 degree, embedded in an isotropic,multi-speech-source noise field. Several system parameters were varied, inparticular number of microphones in the bilateral hearing aid scenario wasset to 2, 4, and 6, respectively.
 Results show that DNNs provide a clear improvement inlocalization performance over a linear classifier reference system.Increasing the number of microphones from 2 to 4 results in a larger increase ofperformance for the DNNs than for the linear system. However, 6 microphonesprovide only a small additional gain. The DNN architectures perform betterwith 4 microphones than the linear approach does with 6 microphones, thusindicating that location-specific information in source-interference scenariosis encoded non-linearly in the sound field.
Highlights
The human auditory systems routinely performs acoustic source localization, a task is important in technical systems since it permits detection of relevant event such as speech, facilitates reconfiguration of spatial signal processing, and may trigger subsequent actions such as obstacle avoidance in robots.Location-specific information as measured with multi-channel microphone arrays is encoded in relative transfer functions (RTFs, [8]), dominated by, but not limited to, time-differences of arrival (TDOA) of the direct-path component of the acoustic signal
Classic approaches for source localization are based on TDOA analysis which commonly uses the generalized cross-correlation (GCC) method to yield robust TDOA estimates [2, 9]
The results show that deep neural network (DNN)=processing obtains a larger benefit from an additional microphones compared to the linear network Net R
Summary
The human auditory systems routinely performs acoustic source localization, a task is important in technical systems since it permits detection of relevant event such as speech, facilitates reconfiguration of (auditory) spatial signal processing, and may trigger subsequent actions such as obstacle avoidance in robots. Location-specific information as measured with multi-channel microphone arrays is encoded in relative transfer functions (RTFs, [8]), dominated by, but not limited to, time-differences of arrival (TDOA) of the direct-path component of the acoustic signal. The present work evaluates a non-linear extension of an earlier linear approach [5] by employing deep feed-forward networks that learn the transformation from multi-channel audio signals to a probabilistic location map.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the Northern Lights Deep Learning Workshop
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.