Abstract

Audio classification is a widely studied concept in the field of acoustic classification, and audio event classification (AEC) is rapidly gaining popularity. In this chapter, we compare some standard classification models and their results for classifying audio events. We examine support vector machines (SVMs) using different kernels, decision trees, and logistic regression. For this study, we used the TAU Urban Acoustic Scenes 2019 dataset and DCASE 2016 Challenge Dataset. We extracted feature vectors from a corpus using Mel-frequency cepstral coefficients (MFCCs). These coefficients are fed into the aforementioned algorithms, which are in turn trained on the datasets. The experimental results show that an SVM with linear kernel yields the best result compared to the other machine learning algorithms implemented. In the future, we plan to implement deep neural networks (DNNs) to classify acoustic events. DNNs offer robust feature extraction and dense results. Training DNNs with fused prosodic features is expected to provide better results than SVMs or random forests.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call