This study presents a novel approach to emergency vehicle classification that leverages a comprehensive set of informative audio features to distinguish between ambulance sirens, fire truck sirens, and traffic noise. A unique contribution lies in combining time domain features, including root mean square (RMS) and zero-crossing rate, to capture the temporal characteristics, like signal energy changes, with frequency domain features derived from short-time Fourier transform (STFT). These include spectral centroid, spectral bandwidth, and spectral roll-off, providing insights into the sound’s frequency content for differentiating siren patterns from traffic noise. Additionally, Mel-frequency cepstral coefficients (MFCCs) are incorporated to capture the human-like auditory perception of the spectral information. This combination captures both temporal and spectral characteristics of the audio signals, enhancing the model’s ability to discriminate between emergency vehicles and traffic noise compared to using features from a single domain. A significant contribution of this study is the integration of data augmentation techniques that replicate real-world conditions, including the Doppler effect and noise environment considerations. This study further investigates the effectiveness of different machine learning algorithms applied to the extracted features, performing a comparative analysis to determine the most effective classifier for this task. This analysis reveals that the support vector machine (SVM) achieves the highest accuracy of 99.5%, followed by random forest (RF) and k-nearest neighbors (KNNs) at 98.5%, while AdaBoost lags at 96.0% and long short-term memory (LSTM) has an accuracy of 93%. We also demonstrate the effectiveness of a stacked ensemble classifier, and utilizing these base learners achieves an accuracy of 99.5%. Furthermore, this study conducted leave-one-out cross-validation (LOOCV) to validate the results, with SVM and RF achieving accuracies of 98.5%, followed by KNN and AdaBoost, which are 97.0% and 90.5%. These findings indicate the superior performance of advanced ML techniques in emergency vehicle classification.
Read full abstract