Abstract

Polyphonic sound source localization and detection (SSLD) task aims to recognize the categories of sound events, identify their onset and offset times, and detect their corresponding direction-of-arrival (DOA), where polyphonic refers to the occurrence of multiple overlapping sound sources in a segment. However, vanilla SSLD methods based on convolutional recurrent neural network (CRNN) suffer from insufficient feature extraction. The convolutions with kernel of single scale in CRNN fail to adequately extract multi-scale features of sound events, which have diverse time-frequency characteristics. It results in that the extracted features lack fine-grained information helpful for the localization of sound sources. In response to these challenges, we propose a polyphonic SSLD network based on global-local feature extraction and recalibration (GLFER-Net), where the global-local feature (GLF) extractor is designed to extract the multi-scale global features through an omni-directional dynamic convolution (ODConv) layer and multi-scale feature extraction (MSFE) module. The local feature extraction (LFE) unit is designed for capturing detailed information. Besides, we design a feature recalibration (FR) module to emphasize the crucial features along multiple dimensions. On the open datasets of Task3 in DCASE 2021 and 2022 Challenges, we compared our proposed GLFER-Net with six and four SSLD methods, respectively. The results show that the GLFER-Net achieves competitive performance. The modules we designed are verified to be effective through a series of ablation experiments and visualization analyses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.