Abstract
Motivated by the state-of-the-art performance of Dense Convolutional Networks (DenseNet) on highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN and ImageNet), this work presents improvements to the Acoustic Scene Classification (ASC) task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2017) based on an optimized DenseNet model. Multi-channel Convolutional Neural Network (CNN) is also explored to extract features from different audio channels in an end-to-end manner. In the experiments, the proposed model is compared with the challenge baseline model of DCASE 2017 and several other state-of-the-art CNN architectures subject to the classification accuracy on the same open-source DCASE datasets. The results show that the proposed DenseNet-based architecture can achieve a superior performance in classification accuracy but with a lower model complexity in contrast with other models.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have