Abstract

Like air pollution, sound pollution has grown to be a major concern for city residents, designers, and developers. Detecting and recognizing sound types and sources in cities and suburban areas or any environment have become a necessity for the quality of life as well as security. In recent years, researchers have explored many models using Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) Neural Network, and different combinations of these techniques, which produced promising results when combined with spectrogram images, or its different variations, to classify urban sounds. This research uses the DNN model performance as a baseline to compare the CNN and LSTM models' performance for classifying urban sound using Mel scale cepstral analysis (MEL) spectrum images using an open-source library called Librosa for sound processing. Models' performance was evaluated using the UrbanSound8k dataset. The CNN model underperformed with an accuracy rate of 87.15% and f1 score of 85.63% compared to both the DNN base model and the LSTM model. In contrast, comparing the LSTM model with CNN, LSTM shows better accuracy performance on test data with 90.15%, and f1 score of 90.15%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call