Abstract
This paper proposes a model for the task of Acoustic Scene Classification. The proposed model utilizes convolutional neural networks and a random forest classifier to predict the class of the audio clips. The features used by the proposed model are log-Mel, Mel-frequency cepstral coefficient, and Gammatone cepstral coefficient spectrograms. Each spectrogram is processed using a convolutional neural network and combined into a single vector. The processed feature vectors are classified into one of the acoustic scenes using the random forest classifier. The proposed model is evaluated on Tampere University of Technology Urban Acoustic Scenes 2018 and the Tampere University Urban Acoustic Scenes 2019 development datasets. The performance of the proposed model is compared with the Detection and Classification of Acoustic Scenes and Events 2018 and 2019 challenge baseline models to show its efficacy. The proposed model has an accuracy of 68.1% and 67.1% for the two datasets, respectively.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have