Abstract

This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for robust binaural localisation of multiple sources in reverberant environments. DNNs are used to learn the relationship between the source azimuth and binaural cues, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs). In contrast to many previous binaural hearing systems, the proposed approach is not restricted to localisation of sound sources in the frontal hemifield. Due to the similarity of binaural cues in the frontal and rear hemifields, front-back confusions often occur. To address this, a head movement strategy is incorporated in the localisation model to help reduce the front-back errors. The proposed DNN system is compared to a Gaussian mixture model (GMM) based system that employs interaural time differences (ITDs) and ILDs as localisation features. Our experiments show that the DNN is able to exploit information in the CCF that is not available in the ITD cue, which together with head movements substantially improves localisation accuracies under challenging acoustic scenarios in which multiple talkers and room reverberation are present.

Highlights

  • T HIS paper aims to reduce the gap in performance between human and machine sound localisation, in conditions where multiple sound sources and room reverberation are present

  • This suggests that when localisation was restricted to the frontal hemifield, the deep neural networks (DNNs) can effectively extract cues from the clean cross-correlation function (CCF)-interaural level differences (ILDs) features that are robust in the presence of reverberation

  • This paper presented a machine-hearing framework that combines DNNs and head movements for robust localisation of multiple sources in reverberant conditions

Read more

Summary

INTRODUCTION

T HIS paper aims to reduce the gap in performance between human and machine sound localisation, in conditions where multiple sound sources and room reverberation are present. Sound localisation by machine systems is usually unreliable in the presence of interfering sources and reverberation This is the case even when an array of multiple microphones is employed [2], as opposed to the two (binaural) sensors available to human listeners. A number of authors have proposed binaural sound localisation systems that use the same approach, by extracting ITDs and ILDs from acoustic recordings made at each ear of an artificial head [3]–[6]. In contrast to many previous machine systems, the approach proposed here is not restricted to sound localisation in the frontal hemifield; we consider source positions in the 360◦ azimuth range around the head In this unconstrained case, the location of a sound cannot be uniquely determined by ITDs and ILDs; due to the similarity of these cues in the frontal and rear hemifields, front-back confusions occur [8].

SYSTEM
Binaural Feature Extraction
DNN Localization
Localisation With Head Movements
Binaural Simulation
Multi-conditional Training
Experimental Setup
RESULTS AND DISCUSSION
Contribution of the ILD Cue
Benefit of the Head Movement Strategy
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call