Real-Time Vehicle Sound Detection System Based on Depthwise Separable Convolution Neural Network and Spectrogram Augmentation

Chaoyi Wang,Jianpo Liu,Huawei Liu,Yaozhe Song,Xiaobing Yuan,Haolong Liu,Baoqing Li

doi:10.3390/rs14194848

Chaoyi Wang, Jianpo Liu + Show 5 more

Open Access

https://doi.org/10.3390/rs14194848

Copy DOI

Abstract

This paper proposes a lightweight model combined with data augmentation for vehicle detection in an intelligent sensor system. Vehicle detection can be considered as a binary classification problem, vehicle or non-vehicle. Deep neural networks have shown high accuracy in audio classification, and convolution neural networks are widely used for audio feature extraction and audio classification. However, the performance of deep neural networks is highly dependent on the availability of large quantities of training data. Recordings such as tracked vehicles are limited, and data augmentation techniques can be applied to improve the overall detection accuracy. In our case, spectrogram augmentation is applied on the mel spectrogram before extracting the Mel-scale Frequency Cepstral Coefficients (MFCC) features to improve the robustness of the system. Then depthwise separable convolution is applied to the CNN network for model compression and migrated to the hardware platform of the intelligent sensor system. The proposed approach is evaluated on a dataset recorded in the field using intelligent sensor systems with microphones. The final frame-level accuracy achieved was 94.64% for the test recordings and 34% of the parameters were reduced after compression.

Full Text