Deep and CNN fusion method for binaural sound source localisation

Shilong Jiang,Peipei Yuan,Yongheng Sun,Lulu Wu,Hong Liu

doi:10.1049/joe.2019.1207

Abstract

In binaural sound source localisation, front–back confusion is often the challenging problem when localising sources in the noisy or reverberant environments. Hence, a novel algorithm fusing deep and convolutional neural network (CNN) is proposed to address this issue. First, joint features, which consist of interaural level differences (ILDs) and cross-correlation function (CCF) within a lag range, are extracted from binaural signals. Second, with the extracted CCF–ILD features, CNN is used for the front–back classification task, while deep neural network is used for azimuth classification task. The front–back features extracted by the CNN can be leveraged as additional information for the sound source localisation task. Also, an angle-loss function is designed to avoid the overfitting problem and to improve the generalisation ability of this method in adverse acoustic conditions. Finally, two branches are concatenated and then followed by an output layer, which generates the posterior probability of azimuth angles, and the azimuth corresponding to the maximum posterior probability is chosen as the direction of sound source. Experimental results demonstrate the effectiveness of the authors’ method for front–back decision and azimuth estimation in noisy and reverberant environments.

Full Text