Abstract

Convolutional neural networks (CNNs) have been widely used with remarkable success in the acoustic scene classification (ASC) task. However, the performance of these CNNs highly relies on their architectures, requiring a lot of effort and expertise to design CNNs suitable for the investigated problem. In this work, we propose an efficient genetic algorithm (GA) that aims to find optimized CNN architectures for the ASC task. The proposed algorithm uses frequency-dimension splitting of the input spectrograms in order to explore the architecture search space in sub-CNN models in addition to classical single-path CNNs. Specifically, this algorithm aims to find the best number of sub-CNNs in addition to their architectures to better capture the distinct features of the input spectrograms. The proposed GA is specifically designed for sound classification to suit the ASC task than many other GAs that optimize conventional single-path CNN architectures. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method. Specifically, the proposed algorithm has achieved around 17.8%, 16%, and 17.2%, relative improvement in accuracy with respect to the baseline systems on the development datasets of DCASE2018-Task1A, DCASE2019-Task1A, and DCASE2020-Task1A, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call