Abstract

This paper introduces a novel knowledge distillation (KD) framework that distills the knowledge from multiple deep models trained on spectrogram features obtained by dividing the spectrogram into multiple sub-band spectrograms. The deep models learned from sub-band spectrograms prevent information loss while performing knowledge distillation from a teacher model to a student model receiving the full spectrogram as an input. The student models’ performance is evaluated on three benchmark sound datasets, viz., the ESC-10, RAVDESS and Audio MNIST datasets. Impact of three state-of-the-art attention mechanisms is investigated thoroughly to enhance the final accuracy of the student model supervised by the proposed knowledge distillation framework for sound classification. Experiments and results show that the performance of the student model in the presence of state-of-the-art attention mechanisms is comparable and competitive to state-of-the-art techniques. Moreover, the student model trained on the Audio MNIST dataset attains an hitherto unpublished accuracy of 98.24%, a new benchmark for the Audio MNIST dataset. Additionally, Grad-CAM visualization of the spectrograms is provided to understand the spectrogram’s relevant regions and explain why the model classifies a signal into a specific class. The code used in this work is available at: https://github.com/achyutmani/Sub-Band-Guided-KD-for-Sound-Classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call