Acoustic data detection in large-scale emergency vehicle sirens and road noise dataset

Mahmoud Y. Shams,Tarek Abd El-Hafeez,Esraa Hassan

doi:10.1016/j.eswa.2024.123608

Abstract

This paper presents a novel deep learning model called Self-Attention Layer within a Convolutional Neural Network (SACNN), specifically designed for detecting acoustic data in extensive datasets containing emergency vehicle sirens and road noise. SACNN leverages a combination of EfficientNet and One Dimensional Convolutional Neural Networks (1D_CNN) to improve the precision and efficiency of detection. The dataset comprises 3-second WAV-format audio files categorized into Emergency Vehicles (Ambulance and Firetruck sirens) and Traffic (plain traffic sounds), with each category having 200 sound files and their corresponding 200 spectrogram images per audio file. Experimental results demonstrate SACNN's superior performance across various metrics, including accuracy, precision, recall, and F1 score, while also achieving notable advancements in computational efficiency. The research addresses the need for more effective solutions in the domain of emergency vehicle siren and road noise detection by introducing SACNN as a cutting-edge solution. By integrating EfficientNet and 1D_CNN, SACNN surpasses existing models in accuracy and computational efficiency. Comparative analysis with EfficientNet and 1D_CNN using a large-scale dataset validates SACNN's superior performance, with an average accuracy, precision, recall, and F1 score of 1.00. Additionally, SACNN exhibits the highest computational efficiency, averaging 0.20883 s per sample. These findings highlight SACNN's efficacy in detecting and classifying emergency vehicle sirens and road noise in large-scale datasets, with implications for traffic management, public safety, and noise pollution monitoring. SACNN offers the potential for constructing real-time acoustic data detection systems.

Full Text