A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identification

Qiaolin He,Haifeng Hu,Zhijie Zheng

doi:10.1145/3617375

Abstract

Visible-Infrared Person Re-identification (VI-ReID) aims to search for the identity of the same person across different spectra. The feature maps obtained from the convolutional layers are generally used for loss calculation in the later stages of the model in VI-ReID, but their role in the early and middle stages of the model remains unexplored. In this article, we propose a novel Rethinking Convolutional Features (ReCF) approach for VI-ReID. ReCF consists of two modules: Middle Feature Generation (MFG), which utilizes the feature maps in the early stage to reduce significant modality gap, and Temporal Feature Aggregation (TFA), which uses the feature maps in the middle stage to aggregate multi-level features for enlarging the receptive field. MFG generates middle modality features in the form of a learnable convolution layer as a bridge between RGB and IR modalities, which is more flexible than using fixed-parameter grayscale images and yields a better middle modality to further reduce the modality gap. TFA first treats the convolution process as a video sequence, and the feature map of each convolution layer can be considered a worthwhile video frame. Based on this, we can obtain a multi-level receptive field and a temporal refinement. In addition, we introduce a color-unrelated loss and a modality-unrelated loss to constrain the modality features for providing a common feature representation space. Experimental results on the challenging VI-ReID datasets demonstrate that our proposed method achieves state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Oct 18, 2023
Citations: 2

Similar Papers

Using feature maps to unpack the CNN ‘Black box’ theory with two medical datasets of different modality
Sami Azam ... Mirjam Jonkman
Intelligent Systems with Applications | VOL. 18
Sami Azam, et. al.Sami Azam ... Mirjam Jonkman
01 May 2023
Intelligent Systems with Applications | VOL. 18

Accurate apnea and hypopnea localization in PSG with Multi-scale object detection via Dual-modal Feature Learning
Yifeng Ji ... Yunbo Tang
Biomedical Signal Processing and Control | VOL. 89
Yifeng Ji, et. al.Yifeng Ji ... Yunbo Tang
10 Nov 2023
Biomedical Signal Processing and Control | VOL. 89

Super-resolution reconstruction of binocular image based on multi-level fusion attention network
Lei Xu ... Huihui Song
Journal of Image and Graphics | VOL. 28
Lei Xu, et. al.Lei Xu ... Huihui Song
01 Jan 2023
Journal of Image and Graphics | VOL. 28

SC-CAN: Spectral Convolution and Channel Attention Network for Wheat Stress Classification
Wijayanti Nurul Khotimah ... Xiu Jin
Remote Sensing | VOL. 14
Wijayanti Nurul Khotimah, et. al.Wijayanti Nurul Khotimah ... Xiu Jin
30 Aug 2022
Remote Sensing | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications