Urban Sound Classification

Pravisha Desale,Yash Vaykar,Prof Manisha Bharti

doi:10.55041/ijsrem25684

Abstract

There are many sounds all around us and our brain can easily and clearly identify them. Furthermore, our brain processes the received sound signals continuously and provides us with relevant environmental knowledge. Although not up to the level of accuracy of the brain, there are some smart devices which can extract necessary information from an audio signal with the help of different algorithms. Over the years, various models like Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), Region- Convolutional Neural Networks (R-CNNs), and numerous machine learning techniques have been employed for sound classification. These methods have shown impressive results in distinguishing spectra-temporal patterns and different sound categories. The novelty of our research lies in showing that the long-short term memory (LSTM) shows a better result in classification accuracy compared to CNN for many features used. Additionally, we've evaluated model accuracy using different techniques such as data augmentation and feature stacking. With our RNN model, we achieved a remarkable accuracy of 87%, setting a new benchmark in performance on the UrbanSound8k dataset. Our findings not only advance the field of sound classification but also underscore the potential of LSTM networks and the importance of innovative techniques such as data augmentation and feature stacking in improving the accuracy of sound recognition systems. Key Words: Sound Classification, Urbansound8k, Librosa, Spectrograms, deep learning, CNN, LSTM.

Full Text