Abstract

Sound event detection has recently become a hot topic in the sound process while discontinuous and overlapping sound events still pose challenges for sound event detection. In this paper, we propose a capsule network with pixel-based attention and bidirectional gated recurrent unit (PBA-AttCapsNet-BGRU) model which contains the high-level feature extraction module, the attention capsule network (AttCapsNet) module, and the bidirectional gated recurrent unit (BGRU) module. Specifically, pixel-based attention (PBA) is employed in the convolutional neural network named PBACNN to extract features more relevant to sound events from binaural log-Mel spectrograms (bin-LMS) features in the high-level feature extraction module. The module can solve the problem of discontinuous sound events. Furthermore, to detect overlapping sound events effectively, we propose an AttCapsNet module that combines capsule network (CapsNet) and soft attention mechanism. Also, the attention dynamic routing algorithm is introduced to validly distinguish the existence of sound events and focus on the significant frames in this paper. In addition, BGRU module is composed of BGRU and time-distributed fully-connected layers. It can obtain the context information and overcome the overfitting problem to a certain extent. We conducted the experiments on Task 4 of the DCASE 2017 Challenge. Experimental results show that the proposed PBA-AttCapsNet-BGRU model can achieve 0.032 improvements in F1 and 0.07 improvement in ER with the state-of-the-art models in sound event detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call