Abstract

Vehicle re-identification (ReID) focuses on searching for images of the same vehicle across different cameras and can be considered as the most fine-grained ID-level classification task. It is fundamentally challenging due to the significant differences in appearance presented by a vehicle with the same ID (especially from different viewpoints) coupled with the subtle differences between vehicles with different IDs. Spatial attention mechanisms that have been proven to be effective in computer vision tasks also play an important role in vehicle ReID. However, they often require expensive key-point labels or suffer from noisy attention masks when trained without key-point labels. In this work, we propose a transformer-based attention network (TAN) for learning spatial attention information and hence for facilitating learning of discriminative features for vehicle ReID. Specifically, in contrast to previous studies that adopted a transformer network, we designed the attention network as an independent branch that can be flexibly utilized in various tasks. Moreover, we combined the TAN with two other branches: one to extract global features that define the image-level structures, and the other to extract the auxiliary side-attribute features that are invariant to viewpoint, such as color, car type, etc. To validate the proposed approach, experiments were conducted on two vehicle datasets (the VeRi-776 and VehicleID datasets) and a person dataset (Market-1501). The experimental results demonstrated that the proposed TAN is effective in improving the performance of both the vehicle and person ReID tasks, and the proposed method achieves state-of-the-art (SOTA) perfomance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call