Abstract

Visible infrared person reidentification (VI-REID) plays a critical role in night-time surveillance applications. Most methods attempt to reduce the cross-modality gap by extracting the modality-shared features. However, they neglect the distinct image-level discrepancies among heterogeneous pedestrian images. In this article, we propose a reciprocal bidirectional framework (RBDF) to achieve modality unification before discriminative feature learning. The bidirectional image translation subnetworks can learn two opposite mappings between visible and infrared modality. Particularly, we investigate the characteristics of the latent space and design a novel associated loss to pull close the distribution between the intermediate representations of two mappings. Mutual interaction between two opposite mappings helps the network generate heterogeneous images that have high similarity with the real images. Hence, the concatenation of original and generated images can eliminate the modality gap. During the feature learning procedure, the attention mechanism-based feature embedding network can learn more discriminative representations with the identity classification and feature metric learning. Experimental results indicate that our method achieves state-of-the-art performance. For instance, we achieve 54.41% mAP and 57.66% rank-1 accuracy on SYSU-MM01 dataset, outperforming the existing works by a large margin.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.