Abstract

Environmental Sound Classification (ESC) plays a vital role in the field of machine auditory scene. Recently, the Highway Network CNN model has achieved the state-of-art results via solving the vanishing-gradient problem of much deeper CNN. However, carefully analyzing the Highway Network model shows that the Highway Network model lacks ability to maximize information flow between layers, which is essentially benefits the discriminative representation of acoustic events. Besides, the Highway Network model size is larger than 20MB for ESC task, which is still large for mobile applications. Regarding to these two issues, in this study, we propose a novel Densely Connected Highway Convolutional Network (DCH-Net) model for ESC task. Specifically, a densely highway module is developed which is able to ensure the maximum information flow between layers by connecting all layers directly with each other. Besides, to reduce the model size, a global average pooling layer is designed which replaces the traditional fully connection layers and the parameters of the model is greatly reduced. Experimental results show that our DCH-Net ESC model achieves accuracy of 69% and 90% on ESC50 and ESCIO dataset respectively, which is 2% and 10% higher than that of Highway Network based Highway networks ESC model. Meanwhile our model size is only 2MB.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call