Abstract

In this paper, we present an approach for acoustic scene classification, which aggregates spectral and temporal features. We do this by proposing the first use of the variable-Q transform (VQT) to generate the time–frequency representation for acoustic scene classification. The VQT provides finer control over the resolution compared to the constant-Q transform (CQT) or short time fourier transform and can be tuned to better capture acoustic scene information. We then adopt a variant of the local binary pattern (LBP), the adjacent evaluation completed LBP (AECLBP), which is better suited to extracting features from acoustic time–frequency images. Our results yield a 5.2% improvement on the DCASE 2016 dataset compared to the application of standard CQT with LBP. Fusing our proposed AECLBP with HOG features, we achieve a classification accuracy of 85.5%, which outperforms one of the top performing systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call