Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification

Jiaqi Zhao,Rui Yao,Silin Chen,Hanzheng Wang,Yong Zhou,Abdulmotaleb El Saddik

doi:10.1109/tmm.2022.3163847

Abstract

Visible-infrared person re-identification (VI-ReID) is a challenging task in computer vision, aiming at matching people across images from visible and infrared modalities. The widely used VI-ReID framework consists of a convolution neural backbone network that extracts the visual features, and a feature embedding network to project heterogeneous features to the same feature space. However, many studies based on the existing pre-trained models neglect potential correlations between different locations and channels within a single sample during the feature extraction. Inspired by the success of the Transformer in computer vision, we extend it to enhance feature representation for VI-ReID. In this paper, we propose a discriminative feature learning network based on a visual Transformer (DFLN-ViT) for VI-ReID. Firstly, to capture long-term dependencies between different locations, we propose a spatial feature awareness module (SAM), which utilizes a single-layer Transformer with a novel patch-embedding strategy to encode location information. Secondly, to refine the representation at each channel, we design a channel feature enhancement module (CEM). The CEM treats the features of each channel as a sequence of Transformer inputs, taking advantage of the Transformer's ability to model long-term dependencies. Finally, we propose a Triplet-aided Hetero-Center (THC) loss to learn more discriminative feature representation by balancing the cross-modality distance and intra-modality distance of the centre. The experimental results on two datasets show that our method can significantly improve the VI-ReID performance, outperforming most state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jan 1, 2023
Citations: 36

Similar Papers

Convolutional Neural Network based Age Estimation from Facial Image and Depth Prediction from Single Image

-

01 Jan 2015
01 Jan 2015

Pose-Robust and Discriminative Feature Representation by Multi-task Deep Learning for Multi-view Face Recognition
Jeong-Jik Seo ... Yong Man Ro
-
Jeong-Jik Seo, et. al.Jeong-Jik Seo ... Yong Man Ro
01 Dec 2015
01 Dec 2015

Discriminative quadratic feature learning for handwritten Chinese character recognition
Ming-Ke Zhou ... Cheng-Lin Liu
Pattern Recognition | VOL. 49
Ming-Ke Zhou, et. al.Ming-Ke Zhou ... Cheng-Lin Liu
01 Aug 2015
Pattern Recognition | VOL. 49

GRVT: Toward Effective Grocery Recognition via Vision Transformer
Shu Liu ... Beiji Zou
-
Shu Liu, et. al.Shu Liu ... Beiji Zou
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia