Urban sound classification based on 2-order dense convolutional network using dual features

Zilong Huang,Chen Liu,Hongbo Fei,Wei Li,Jinghu Yu,Yi Cao

doi:10.1016/j.apacoust.2020.107243

Abstract

Audio carry a large amount of life scenes and physical events in the city, therefore, developing deep learning approach to automatically extract this information has huge potential and application in building smart-city. In this paper, a novel urban sound event classification model based on 2-order dense convolutional network using dual features is proposed, which aims at the problems of insufficient classification accuracy and adaptability of current models. Firstly, the brief introduction of urban sound classification development and application is presented in Section 1. Then, the method of feature extraction and add noise environment is respectively introduced in Section 2. Moreover, a new network structure referred to as 2-order dense convolutional network (shorten as 2-DenseNet) and its algorithm are presented in Section 3. Meanwhile, an urban sound event classification model based on 2-DenseNet using dual features, i.e. D-2-DenseNet is proposed in this paper. Theoretically, D-2-DenseNet not only can accelerate the convergence speed when compared with DenseNet, but also can enhance the classification accuracy and guarantee a good generalization ability owing to the fact that dual features fusion is exploited in the proposed model. Finally, in order to validate advantages of the D-2-DenseNet, this new model is respectively exploited in the urban sound event classification based on UrbanSound8K and Dcase2016 datasets. The experimental result shows that the accuracy of the network is respectively 84.83% and 85.17%, which has increase up to 13.81% and 7.07% compared with baseline. Compared with single feature network, the classification accuracy of D-2-DenseNet has increased by 3.35% and 4.78% respectively in noise environment.

Full Text