Abstract

Environmental sound classification (ESC) is an increasingly relevant field of research in recent years but has high computational overhead in its classification of environmental sounds. Knowledge distillation (KD) is a prominent technique to develop a lightweight deep model by distilling knowledge from a heavyweight model into a less computationally complex model. Generally, conventional KD techniques require manual setting of a temperature parameter to explore the similarity among the classes. Herein, we propose a novel data augmentation technique that creates an augmented data instance by blending hidden features of a data sample from one class with a style information (mean and standard values) of a data sample from another class. We have designed a new loss function to accomplish Knowledge Distillation that minimizes Kullback–Leibler (KL) divergence loss between class probabilities obtained from the teacher and student networks while classifying the augmented data sample. Furthermore, the proposed KD technique rids the process of a temperature parameter that needs to be set manually by the traditional vanilla KD technique. Our experiments on two benchmark ESC datasets i.e., the ESC-10 and DCASE 2019 Task-1(A) dataset demonstrate comparable performance of the student network to state-of-the-art techniques. Moreover, the student model is explainable and clearly explains why a signal is classified into a specific class.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call