Abstract

Learning fine-grained discriminative information is essential to address the challenges of small inter-class differences and large intra-class differences in vehicle re-identification (Re-ID). Attentional mechanism is often used to capture important global information in images rather than fine-grained discriminative information. Studies have shown that the multi-axis interaction of information can enhance the feature representation ability of networks. This paper explores how to use the multi-axis interaction of information to facilitate more effective learning of attention and how to capture important detailed information in local regions. We propose a multi-axis interactive multidimensional attention network (MIMA-Net) for vehicle Re-ID. The network allows information to interact on multiple axes and calibrates the weight distribution of features from multiple dimensions to learn subtle discriminative information in vehicle parts/regions. The window-channel attention module (W-CAM) in MIMA-Net facilitates the learning of channel attention by interacting first across locations and then across channels, while the channel group-spatial attention module (CG-SAM) facilitates the learning of spatial attention by interacting first across channels and then across locations. These two modules perform window partitioning in a priori manner and channel semantic aggregation in an adaptive manner to learn discriminative semantic features in parts, respectively. These two approaches complement each other to strengthen the feature representation ability of MIMA-Net. Extensive experiments on three large public datasets, VeRi-776, VehicleID, and VERI-Wild, verify the effectiveness of our MIMA-Net and show that our method achieves state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call