Abstract
There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7° in DOA error, and 4.5% in Frame recall compared to a CRNN method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.