Sound Event Detection with Speech Interference Using Convolutional Recurrent Neural Networks

Tianyao Zhou,Xinzhou Xu,Runze Zhai,Zeming He

doi:10.1109/bdai52447.2021.9515259

Abstract

Sound event detection (SED) refers to recognizing sound event, in which the prevailing method of SED is to employ deep neural networks at present. However, it remains unknown on the detection performance for sound events with background speech. Thus within this paper, we propose to utilize a Convolutional Recurrent Neural Network (CRNN) in detecting multi-channel audio from overlapping sources in the speech-interference condition with multiple speakers. The approach includes two modules of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), in which the CNN module maps acoustic features to the input of the RNN, while the RNN aims to model the time series using Bi-directional Gated Recurrent Units (Bi-GRU). Afterwards, we employ experiments on SED data with speech interference. The experimental results indicate that the proposed approach makes it robust to detect sound events when provided speech signals as the background noise, compared with conventional SED approaches.

Full Text