Abstract

Aiming at the difficulty in extracting the features of time–frequency images for the recognition of car engine sounds, we propose a method to recognize them based on a deformable feature map residual network. A deformable feature map residual block includes offset and convolutional layers. The offset layers shift the pixels of the input feature map. The shifted feature map is superimposed on the feature map extracted by the convolutional layers through shortcut connections to concentrate the network to the sampling in the region of interest, and to transmit the information of the offset feature map to the lower network. Then, a deformable convolution residual network is designed, and the features extracted through this network are fused with the Mel frequency cepstral coefficients of car engine sounds. After recalibration by the squeeze and excitation block, the fused results are fed into the fully connected layer for classification. Experiments on a car engine sound dataset show that the accuracy of the proposed method is 84.28%. Compared with the existing state-of-the-art methods, in terms of the accuracy of recognizing car engine sounds under various operating conditions, the proposed method represents an improvement over the method based on dictionary learning and a convolutional neural network.

Highlights

  • We propose a deformable feature map residual network to identify the logarithmic Mel spectrogram of engine sounds

  • We studied car engine sounds under five typical operating conditions and proposed a deformable feature map residual network for engine sound recognition

  • Without changing the positions of sampling points, deformable feature map residual block (DFMRB) can pass the offset feature map information through a shortcut connect to the lower-level network, which can focus on sampling the region of interest in the input feature map and improve the feature extraction capability of the logarithmic Mel spectrogram for engine sounds under five operating conditions

Read more

Summary

Introduction

We propose a deformable feature map residual network to identify the logarithmic Mel spectrogram of engine sounds. The designed deformable feature map residual block (DFMRB) changes the positions of the pixels in the feature map and obtains a shifted feature map by adding an offset variable to the logarithmic Mel spectrogram.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call