Video-based Re-identification Research Articles

Video-based person re-identification (ReID) aims to exploit relevant features from spatial and temporal knowledge. Widely used methods include the part- and attention-based approaches for suppressing irrelevant spatial-temporal features. However, it is still challenging to overcome inconsistencies across video frames due to occlusion and imperfect detection. These mismatches make temporal processing ineffective and create an imbalance of crucial spatial information. To address these problems, we propose the Spatiotemporal Multi-Granularity Aggregation (ST-MGA) method, which is specifically designed to accumulate relevant features with spatiotemporally consistent cues. The proposed framework consists of three main stages: extraction, which extracts spatiotemporally consistent partial information; augmentation, which augments the partial information with different granularity levels; and aggregation, which effectively aggregates the augmented spatiotemporal information. We first introduce the consistent part-attention (CPA) module, which extracts spatiotemporally consistent and well-aligned attentive parts. Sub-parts derived from CPA provide temporally consistent semantic information, solving misalignment problems in videos due to occlusion or inaccurate detection, and maximize the efficiency of aggregation through uniform partial information. To enhance the diversity of spatial and temporal cues, we introduce the Multi-Attention Part Augmentation (MA-PA) block, which incorporates fine parts at various granular levels, and the Long-/Short-term Temporal Augmentation (LS-TA) block, designed to capture both long- and short-term temporal relations. Using densely separated part cues, ST-MGA fully exploits and aggregates the spatiotemporal multi-granular patterns by comparing relations between parts and scales. In the experiments, the proposed ST-MGA renders state-of-the-art performance on several video-based ReID benchmarks (i.e., MARS, DukeMTMC-VideoReID, and LS-VID).

Video-based re-identification (ReID) is a crucial task in computer vision that draws increasing attention due to advances in deep learning (DL) and modern computational devices. Despite recent success with CNN architectures, single models (e.g., 2D-CNNs or 3D-CNNs) alone failed to leverage temporal information with spatial cues. This is due to uncontrolled surveillance scenarios and variable poses leading to inevitable misalignment of ROIs across the tracklets, which is accompanied by occlusion and motion blur. In this context, designing temporal and spatial cues for two different models and their combinations can be beneficial, considering the global of a video-tracklet. 3D-CNNs allow encoding of temporal information while 2D-CNNs extract spatial or appearance information. In this paper, we propose a Spatio-Temporal Cross Attention (STCA) network to utilize both 2D-CNNs and 3D-CNNs that calculate the cross attention mapping both from the layer of 3D-CNNs and 2D-CNNs along a person's trajectory to gate the following layers of 2D-CNNs; and highlight relevant appearance features for the person ReID. Given an input tracklet, the proposed cross attention (CA) is able to capture the salient regions that propagate throughout the tracklet to obtain the global view. This provides a spatio-temporal attention approach that can be dynamically aggregated with spatial features of 2D-CNNs to perform finer-grained recognition. Additionally, we exploit the advantage of utilizing cosine similarity while triplet sampling as well as for calculating the final recognition score. Experimental analyses on three challenging benchmark datasets indicate that integrating spatio-temporal cross attention into the state-of-the-art video ReID backbone CNN architecture allows for improving their recognition accuracy.

Video-based Re-identification Research Articles

Related Topics

Articles published on Video-based Re-identification

A review on video person re-identification based on deep learning

Deep video-based person re-identification (Deep Vid-ReID): comprehensive survey

Multi-Granularity Aggregation with Spatiotemporal Consistency for Video-Based Person Re-Identification.

AA-RGTCN: reciprocal global temporal convolution network with adaptive alignment for video-based person re-identification.

Rethink Motion Information for Occluded Person Re-Identification

Pose-Aided Video-Based Person Re-Identification via Recurrent Graph Convolutional Network

Multi-Context Grouped Attention for Unsupervised Person Re-Identification

PolarBearVidID: A Video-Based Re-Identification Benchmark Dataset for Polar Bears

An Adaptive Partitioning and Multi-Granularity Network for Video-Based Person Re-Identification

STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification

Relation-based global-partial feature learning network for video-based person re-identification

Motion Feature Aggregation for Video-Based Person Re-Identification.

Temporal Weighting Appearance-Aligned Network for Nighttime Video Retrieval

Spatial temporal and channel aware network for video-based person re-identification

Exploiting Global Camera Network Constraints for Unsupervised Video Person Re-Identification

Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect

Instance Hard Triplet Loss for In-video Person Re-identification

Comprehensive feature fusion mechanism for video-based person re-identification via significance-aware attention

Video Based Person Re Identification Methods, Datasets, and Deep Learning

Video-based person re-identification using a novel feature extraction and fusion technique

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Video-based Re-identification Research Articles

Related Topics

Articles published on Video-based Re-identification

A review on video person re-identification based on deep learning

Deep video-based person re-identification (Deep Vid-ReID): comprehensive survey

Multi-Granularity Aggregation with Spatiotemporal Consistency for Video-Based Person Re-Identification.

AA-RGTCN: reciprocal global temporal convolution network with adaptive alignment for video-based person re-identification.

Rethink Motion Information for Occluded Person Re-Identification

Pose-Aided Video-Based Person Re-Identification via Recurrent Graph Convolutional Network

Multi-Context Grouped Attention for Unsupervised Person Re-Identification

PolarBearVidID: A Video-Based Re-Identification Benchmark Dataset for Polar Bears

An Adaptive Partitioning and Multi-Granularity Network for Video-Based Person Re-Identification

STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification

Relation-based global-partial feature learning network for video-based person re-identification

Motion Feature Aggregation for Video-Based Person Re-Identification.

Temporal Weighting Appearance-Aligned Network for Nighttime Video Retrieval

Spatial temporal and channel aware network for video-based person re-identification

Exploiting Global Camera Network Constraints for Unsupervised Video Person Re-Identification

Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect

Instance Hard Triplet Loss for In-video Person Re-identification

Comprehensive feature fusion mechanism for video-based person re-identification via significance-aware attention

Video Based Person Re Identification Methods, Datasets, and Deep Learning

Video-based person re-identification using a novel feature extraction and fusion technique