Incorporating texture and silhouette for video-based person re-identification

Shutao Bai,Hong Chang,Bingpeng Ma

doi:10.1016/j.patcog.2024.110759

Abstract

Silhouette is an effective modality in video-based person re-identification (ReID) since it contains features (e.g., stature and gait) complementary to the RGB modality. However, recent silhouette-assisted methods have not fully explored the spatial–temporal relations within each modality or considered the cross-modal complementarity in fusion. To address these two issues, we propose a Complete Relational Framework that includes two key components. The first component, Spatial-Temporal Relational Module (STRM), explores the spatiotemporal relations. STRM decomposes the video’s spatiotemporal context into local/fine-grained and global/semantic aspects, modeling them sequentially to enhance the representation of each modality. The second component, Modality-Channel Relational Module (MCRM), explores the complementarity between RGB and silhouette videos. MCRM aligns two modalities semantically and multiplies them to capture complementary interrelations. With these two modules focusing on intra- and cross-modal relationships, our method achieves superior results across multiple benchmarks with minimal additional parameters and FLOPs. Code and models will be made available publicly.

Full Text