TB-MFCC multifuse feature for emergency vehicle sound classification using multistacked CNN – Attention BiLSTM

T.M Nithya,P Dhivya,S.N Sangeethaa,P Rajesh Kanna

doi:10.1016/j.bspc.2023.105688

T.M Nithya, P Dhivya + Show 2 more

https://doi.org/10.1016/j.bspc.2023.105688

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Vehicles equipped for emergencies like ambulances, fire engines, and police cruisers play a vital role in society by responding quickly to emergencies and helping to prevent loss of life and maintain order. Vehicle sound identification and classification are very important in the cities to identify emergency vehicles easily and to clear the traffic effectively. Convolutional Neural Network plays an important role in the accurate prediction of vehicles during an emergency. The main motive of this paper is to develop a suitable model and algorithms for data augmentation, feature extraction, and classification. The proposed TB-MFCC multifuse feature is comprised of data augmentation and feature extraction. First, in the proposed signal augmentation, each audio signal uses noise injection, stretching, shifting, and pitching separately and this process increases the number of instances in the dataset. The proposed augmentation reduces the overfitting problem in the network. Second, Triangular Bluestein Mel Frequency Cepstral Coefficients (TB-MFCC) are proposed and fused with Zero Crossing Rate (ZCR), Mel-frequency cepstral coefficients (MFCC), Root Mean Square (RMS), Chroma, and Tempogram to extract the exact feature which increases the accuracy and reduces the Mean Squared Error (MSE) of the model during classification. Finally, the proposed Multi-stacked Convolutional Neural Network (MCNN) with Attention-based Bidirectional Long Short Term Memory (A-BiLSTM) improves the nonlinear relationship among the features. The proposed Pooled Multifuse Feature Augmentation (PMFA) with MCNN & A-BiLSTM increases the accuracy (98.66 %), reduces the False Positive Rate (FPR) by 1.01 %, and loss (0 %). Thus the model predicts the sound without overfitting, underfitting, and vanishing gradient problems.

Full Text