Abstract

Sound event detection (SED) refers to recognizing sound event, in which the prevailing method of SED is to employ deep neural networks at present. However, it remains unknown on the detection performance for sound events with background speech. Thus within this paper, we propose to utilize a Convolutional Recurrent Neural Network (CRNN) in detecting multi-channel audio from overlapping sources in the speech-interference condition with multiple speakers. The approach includes two modules of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), in which the CNN module maps acoustic features to the input of the RNN, while the RNN aims to model the time series using Bi-directional Gated Recurrent Units (Bi-GRU). Afterwards, we employ experiments on SED data with speech interference. The experimental results indicate that the proposed approach makes it robust to detect sound events when provided speech signals as the background noise, compared with conventional SED approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call