Abstract

Vehicle reidentification(ReID) has attracted much attention and is significant for traffic security surveillance. Due to the variety of views of the same vehicle captured by different camera and the great similarity in the visual appearance of different vehicles, it is necessary to explore how to effectively utilize local detail information to achieve collaborative perception to highlight discriminative appearance features. Different from existing local feature exploration methods that focus on using extra part or keypoint information, we propose a global collaborative learning Transformer guided by local abstract features, named LG-CoT, which aims to highlight the highest-attention regions of vehicle images. We adopt Vision Transformer(ViT) as our backbone to extract global features and obtain all local tokens. To reduce the distribution from the background and drive the network to focus more on details, all attention maps containing low-level texture information and high-level semantic information are multiplied to obtain the local regions with highest-attention. Finally, we design a local-attention-guided pose-optimization feature encoding module, which can help the global features focus on local regions adaptively. Extensive experiments on two popular datasets and a dataset we built in a T-junction traffic scene suggest that our method can achieve comparable performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call