Abstract
Sound Event Detection (SED) needs to identify the sound events in a recording and detect the onset and offset times of them. The former desires features with long short-term dependencies to detect sound events with different durations and the latter needs fine-grained dependency. Although our previous proposed Multi-Scale Fully Convolutional Networks (MS-FCN) uses cascaded dilated convolution to model temporal context information and multi-scale information is considered, there are two shortages to deal with: the ignorance of neighboring information and fine-grained dependencies, and neglecting intermediate-length temporal dependencies. The first shortage is caused by the skipping elements sampling mechanism of dilated convolution, by which the neighboring information and fine-grained dependencies are ignored. To overcome this shortage, the paper proposes the dilated mixed convolution module, which mixes dilated convolution and standard convolutions to capture both the fine-grained and long-term dependencies and give weight to neighboring information. The second shortage is caused by the too fast increase of temporal dependent length in cascaded dilated convolution module, which causes too much intermediate temporal information to be ignored. For this shortage, this paper proposes Dilated Temporal Pyramid Pooling module (DTPP), in which parallel dilated convolutions with multiple dilation factors are used to capture the intermediate temporal information with a proper temporal dependent length. As cascaded module has been demonstrated to be valid and efficient to model the temporal context in MS-FCN and DTPP module can capture the ignored temporal information of cascaded module, taking the advantages of both, this paper proposes the cascaded parallel module to capture richer temporal dependencies. Based on that, Multi-Scale Feature Fusion Networks (MSFF-Net) is proposed, which obtains competitive performance on three open datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.