Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks

Chandrasekhar Paseddula,Suryakanth V Gangashetty

doi:10.1016/j.apacoust.2020.107568

Abstract

A major problem in Acoustic Scene Classification (ASC) is a representation of an acoustic scene, which serves to be an important task for ASC. This study used Linear Prediction Cepstral Coefficients (LPCC) and Spectral Centroid Magnitude Cepstral Coefficients (SCMC) features along with log-Mel band energies for the representation of an acoustic scene. Deep Neural Networks (DNN) is being used to model the Acoustic Scene Classification (ASC). LPCCs are used to capture the changes in the auditory spectrum with time and SCMCs are used to capture the weighted average magnitude finely for a given acoustic scene subband. log-Mel band energies are used to capture the spectral envelopes of audio frame. The DNN architecture is used for audio track level classification. We have experimented on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 development dataset and DCASE 2017 dataset. We carried out experiments with individual feature sets, and also performed decision level DNN score fusions for improving the performance.

Full Text