Vehicle Make and Model Recognition (VMMR) requires fast and accurate recognition of a vehicle’s information. Generally, the vision-based VMMR method recognizes different vehicle models that mainly rely on locating and extracting the discriminative part features of a vehicle. In this paper, we propose a Lightweight Recurrent Attention Unit (LRAU) to enhance the feature extraction ability of the standard Convolutional Neural Network (CNN) architectures for VMMR. The proposed LRAU extracts the discriminative part features by generating attention masks to locate the keypoints of a vehicle (e.g., logo, headlight). The attention mask is generated based on the feature maps received by the LRAU and the preceding attention state generated by the preceding LRAU. By adding LRAUs to receive the multi-scale feature maps generated by the standard CNN architecture, discriminative features of different scales can be efficiently extracted and combined. We conduct comprehensive experiments on three challenging VMMR datasets to evaluate the proposed VMMR models. Experimental results show our models have a stable performance under different environmental conditions. Our models achieve state-of-the-art results with 93.94% accuracy on the Stanford Cars dataset, 98.31% accuracy on the CompCars dataset, and 99.41% accuracy on the NTOU-MMR dataset. Moreover, we demonstrate that our models outperform the traditional machine learning-based VMMR models in terms of recognition accuracy and processing speed. In addition, we construct a one-stage Vehicle Detection and Fine-grained Recognition (VDFR) model by combining our LRAU with the general object detection model. Results show the proposed VDFR model can achieve excellent performance with real-time processing speed.
Read full abstract