Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Ning Ma,Guy J Brown,Tobias May

doi:10.1109/taslp.2017.2750760

Abstract

This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for robust binaural localisation of multiple sources in reverberant environments. DNNs are used to learn the relationship between the source azimuth and binaural cues, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs). In contrast to many previous binaural hearing systems, the proposed approach is not restricted to localisation of sound sources in the frontal hemifield. Due to the similarity of binaural cues in the frontal and rear hemifields, front-back confusions often occur. To address this, a head movement strategy is incorporated in the localisation model to help reduce the front-back errors. The proposed DNN system is compared to a Gaussian mixture model (GMM) based system that employs interaural time differences (ITDs) and ILDs as localisation features. Our experiments show that the DNN is able to exploit information in the CCF that is not available in the ITD cue, which together with head movements substantially improves localisation accuracies under challenging acoustic scenarios in which multiple talkers and room reverberation are present.

Highlights

T HIS paper aims to reduce the gap in performance between human and machine sound localisation, in conditions where multiple sound sources and room reverberation are present
This suggests that when localisation was restricted to the frontal hemifield, the deep neural networks (DNNs) can effectively extract cues from the clean cross-correlation function (CCF)-interaural level differences (ILDs) features that are robust in the presence of reverberation
This paper presented a machine-hearing framework that combines DNNs and head movements for robust localisation of multiple sources in reverberant conditions

Summary

INTRODUCTION

T HIS paper aims to reduce the gap in performance between human and machine sound localisation, in conditions where multiple sound sources and room reverberation are present. Sound localisation by machine systems is usually unreliable in the presence of interfering sources and reverberation This is the case even when an array of multiple microphones is employed [2], as opposed to the two (binaural) sensors available to human listeners. A number of authors have proposed binaural sound localisation systems that use the same approach, by extracting ITDs and ILDs from acoustic recordings made at each ear of an artificial head [3]–[6]. In contrast to many previous machine systems, the approach proposed here is not restricted to sound localisation in the frontal hemifield; we consider source positions in the 360◦ azimuth range around the head In this unconstrained case, the location of a sound cannot be uniquely determined by ITDs and ILDs; due to the similarity of these cues in the frontal and rear hemifields, front-back confusions occur [8].

SYSTEM

Binaural Feature Extraction

DNN Localization

Localisation With Head Movements

Binaural Simulation

Multi-conditional Training

Experimental Setup

RESULTS AND DISCUSSION

Contribution of the ILD Cue

Benefit of the Head Movement Strategy

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Dec 1, 2017
Citations: 108	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions
Ning Ma ... Tobias May
-
Ning Ma, et. al.Ning Ma ... Tobias May
06 Sep 2015
06 Sep 2015

Author response: Development of frequency tuning shaped by spatial cue reliability in the barn owl’s auditory midbrain
Keanu Shadron ... José Luis Peña
-
Keanu Shadron, et. al.Keanu Shadron ... José Luis Peña
30 Mar 2023
30 Mar 2023

Editor's evaluation: Development of frequency tuning shaped by spatial cue reliability in the barn owl’s auditory midbrain
Andrew J King
-
Andrew J KingAndrew J King
06 Feb 2023
06 Feb 2023

Neural maps of interaural time and intensity differences in the optic tectum of the barn owl
Jf Olsen ... Ei Knudsen
The Journal of Neuroscience | VOL. 9
Jf Olsen, et. al.Jf Olsen ... Ei Knudsen
01 Jul 1989
The Journal of Neuroscience | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing