Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Mahendra Kumar Gourisaria,Manoj Sahni,Pradeep Kumar Singh,Rakshit Agrawal

doi:10.1007/s43926-023-00049-y

Mahendra Kumar Gourisaria, Manoj Sahni + Show 2 more

Open Access

https://doi.org/10.1007/s43926-023-00049-y

Copy DOI

Journal: Discover Internet of Things	Publication Date: Jan 3, 2024
Citations: 1	License type: CC BY 4.0

Abstract

In the era of automated and digitalized information, advanced computer applications deal with a major part of the data that comprises audio-related information. Advancements in technology have ushered in a new era where cutting-edge devices can deliver comprehensive insights into audio content, leveraging sophisticated algorithms such such as Mel Frequency Cepstral Coefficients (MFCCs) and Short-Time Fourier Transform (STFT) to extract and provide pertinent information. Our study helps in not only efficient audio file management and audio file retrievals but also plays a vital role in security, the robotics industry, and investigations. Beyond its industrial applications, our model exhibits remarkable versatility in the corporate sector, particularly in tasks like siren sound detection and more. Embracing this capability holds the promise of catalyzing the development of advanced automated systems, paving the way for increased efficiency and safety across various corporate domains. The primary aim of our experiment is to focus on creating highly efficient audio classification models that can be seamlessly automated and deployed within the industrial sector, addressing critical needs for enhanced productivity and performance. Despite the dynamic nature of environmental sounds and the presence of noises, our presented audio classification model comes out to be efficient and accurate. The novelty of our research work reclines to compare two different audio datasets having similar characteristics and revolves around classifying the audio signals into several categories using various machine learning techniques and extracting MFCCs and STFTs features from the audio signals. We have also tested the results after and before the noise removal for analyzing the effect of the noise on the results including the precision, recall, specificity, and F1-score. Our experiment shows that the ANN model outperforms the other six audio models with the accuracy of 91.41% and 91.27% on respective datasets.

Full Text