Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks

Reza Varzandeh,Simon Doclo,Volker Hohmann,Kamil Adiloglu

doi:10.1109/icassp40776.2020.9054754

Abstract

While many algorithms deal with direction of arrival (DOA) estimation and voice activity detection (VAD) as two separate tasks, only a small number of data-driven methods have addressed these two tasks jointly. In this paper, a multi-input single-output convolutional neural network (CNN) is proposed which exploits a novel feature combination for joint DOA estimation and VAD in the context of binaural hearing aids. In addition to the well-known generalized cross correlation with phase transform (GCC-PHAT) feature, the network uses an auditory-inspired feature called periodicity degree (PD), which provides a broadband representation of the periodic structure of the signal. The proposed CNN has been trained in a multi-conditional training scheme across different signal-to-noise ratios. Experimental results for a single-talker scenario in reverberant environments show that by exploiting the PD feature, the proposed CNN is able to distinguish speech from non-speech signal blocks, thereby outperforming the baseline CNN in terms of DOA estimation accuracy. In addition, the results show that the proposed method is able to adapt to different unseen acoustic conditions and background noises.

Full Text