Abstract
Audio classification aims to discriminate between different audio signal types, and it has received intensive attention due to its wide applications. In deep learning-based audio classification methods, researchers usually transform the raw signal of audios into different feature representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients) as the inputs of networks. However, selecting the feature representation requires expert knowledge and extensive experimental verification. Besides, using a single type of feature representation may cause suboptimal results as the information implied in different kinds of feature representations may be complementary. Previous works show that ensembling the networks trained on different representations can greatly boost classification performance. However, making inferences using multiple networks is cumbersome and computation expensive. In this paper, we propose a novel end-to-end collaborative training framework for the audio classification task. The framework takes multiple representations as inputs to train the networks jointly with a knowledge distillation method. Consequently, our framework significantly promotes the performance of networks without increasing the computational overhead in the inference stage. Extensive experimental results demonstrate that the proposed approach improves classification performance and achieves competitive results on both acoustic scene classification tasks and general audio tagging tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.