Multi-scale Temporal Cues Learning for Video Person Re-Identification.

Jianing Li,Shiliang Zhang,Tiejun Huang

doi:10.1109/tip.2020.2972108

Jianing Li, Shiliang Zhang + Show 1 more

Open Access

https://doi.org/10.1109/tip.2020.2972108

Copy DOI

Abstract

Temporal cues embedded in videos provide important clues for person Re-Identification (ReID). To efficiently exploit temporal cues with a compact neural network, this work proposes a novel 3D convolution layer called Multi-scale 3D (M3D) convolution layer. The M3D layer is easy to implement and could be inserted into traditional 2D convolution networks to learn multi-scale temporal cues by end-to-end training. According to its inserted location, the M3D layer has two variants, i.e., local M3D layer and global M3D layer, respectively. The local M3D layer is inserted between 2D convolution layers to learn spatial-temporal cues among adjacent 2D feature maps. The global M3D layer is computed on adjacent frame feature vectors to learn their global temporal relations. The local and global M3D layers hence learn complementary temporal cues. Their combination introduces a fraction of parameters to traditional 2D CNN, but leads to the strong multi-scale temporal feature learning capability. The learned temporal feature is fused with a spatial feature to compose the final spatial-temporal representation for video person ReID. Evaluations on four widely used video person ReID datasets, i.e., MARS, DukeMTMC-VideoReID, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our method over the state-of-the art. For example, it achieves rank1 accuracy of 88.63% on MARS without re-ranking. Our method also achieves a reasonable trade-off between ReID accuracy and model size, e.g., it saves about 40% parameters of I3D CNN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-scale Temporal Cues Learning for Video Person Re-Identification.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Journal: IEEE Transactions on Image Processing	Publication Date: Jan 1, 2020
Citations: 115

Similar Papers

Multi-Scale 3D Convolution Network for Video Based Person Re-Identification
Jianing Li ... Shiliang Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Jianing Li, et. al.Jianing Li ... Shiliang Zhang
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Neuronal Synchrony: A Versatile Code for the Definition of Relations?
Wolf Singer
Neuron | VOL. 24
Wolf SingerWolf Singer
01 Sep 1999
Neuron | VOL. 24

Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID
Djebril Mekhazni ... Maximilien Dufau
-
Djebril Mekhazni, et. al.Djebril Mekhazni ... Maximilien Dufau
01 Jan 2023
01 Jan 2023

Situational diversity in video person re-identification: introducing MSA-BUPT dataset
Ruining Zhao ... Fei Su
Complex & Intelligent Systems | VOL. 10
Ruining Zhao, et. al.Ruining Zhao ... Fei Su
23 May 2024
Complex & Intelligent Systems | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-scale Temporal Cues Learning for Video Person Re-Identification.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing