Abstract

Rapid urbanization and population growth worldwide seriously challenge building livable and sustainable cities. This increase causes the increase and diversification of urban sounds. They were transforming these sounds into information instead of just being heard, as noise plays an important role in the concept of smart cities. For this purpose, two basic methods are used to classify urban sounds. In the first of these, the sounds are processed by signal processing methods, and handcrafted features are obtained. In the other method, sounds are represented visually and classified with deep learning models. This study investigated the effect of the individual and hybrid use of features used in both approaches on the classification of urban sounds. In addition, a CNN model was created to classify hybrid features. The results obtained showed that both approaches produced successful results in classification. Among the visual representation methods, mel-spectrogram, scalogram, and spectrogram images achieved the highest classification success. Using mel-spectrogram and acoustic features and the SVM classifier positively affected accuracy. Experiments were performed on the ESC-10 and UrbanSound8k datasets. The highest accuracy for the ESC-10 was 98.33% when using the scalogram and acoustic features with the AVCNN model. The highest accuracy for UrbanSound8k was obtained as 97.70% by classifying the mel-spectrogram and acoustic features obtained from the AVCNN model with the SVM classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call