Abstract

The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm’s Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call