Abstract

Sound source localization (SSL) is an important technique for many audio processing systems, such as speech enhancement/recognition and human-robot interaction. Although many methods have been proposed for SSL, it still remains a challenging task to achieve accurate localization under adverse acoustic scenarios. In this paper, a novel binaural SSL method based on time-frequency convolutional neural network (TF-CNN) with multitask learning is proposed to simultaneously localize azimuth and elevation under unknown acoustic conditions. First, the interaural phase difference and interaural level difference are extracted from the received binaural signals, which are taken as the input of the proposed SSL neural network. Then, an SSL neural network is designed to map the interaural cues to sound direction, which consists of TF-CNN module and multitask neural network. The TF-CNN module learns and combines the time-frequency information of extracted interaural cues to generate the shared feature for multitask SSL. With the shared feature, a multitask neural network is designed to simultaneously estimate azimuth and elevation through multitask learning, which generates the posterior probability for candidate directions. Finally, the candidate direction with the highest probability is taken as the final direction estimation. The experiments based on public head-related transfer function (HRTF) database demonstrate that the proposed method achieves preferable localization performance compared with other popular methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call