Abstract

Overlapped sound event classification (SEC) can be a challenging task, especially in scenarios where the number of possible event classes or the number of simultaneous events occurring (polyphony level) are large. In such cases, the effective training of a multi-label SEC neural network can be challenging, as enough and diverse data need to be available for each of the combinatorially many possible event sets. To alleviate this problem, we examine in this paper the combination and joint training of a multi-channel sound source separation network with a multi-label SEC network. With the separation module acting as a pre-processing step, the task can be approximately reduced to isolated SEC, therefore avoiding the training complexity of overlapped scenarios. In addition, we introduce a multi-channel polyphony detection module that is trained to selectively apply the separation network only in overlapping instances during testing. We evaluate our approaches on a multi-channel dataset of overlapping sound events originating from 50 different classes. Under moderate reverberation conditions, the proposed method achieves up to 7.7% absolute improvement in terms of Fscore in the overlapped scenarios, compared to the baseline approach with traditional multi-label training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.