Abstract

Video-based person re-identification (Re-ID) leverages rich spatio-temporal information embedded in sequence data to further improve the retrieval accuracy comparing with single image Re-ID. However, it also brings new difficulties. 1) Both spatial and temporal information should be considered simultaneously. 2) Pedestrian video data often contains redundant information and 3) suffers from data quality problems such as occlusion, background clutter. To solve the above problems, we propose a novel two-stream Dynamic Pyramid Representation Model (DPRM). DPRM mainly consists of three sub-models, i.e., Pyramidal Distribution Sampling Method (PDSM), Dynamic Pyramid Dilated Convolution (DPDC) and Pyramid Attention Pooling (PAP). PDSM is applied for more effective data pre-processing according to sequence semantic distribution. DPDC and PAP can be considered as two streams to describe the motion context and static appearance of a video sequence, respectively. By fusing the two-stream features together, we finally achieve comprehensive spatio-temporal representation. Notably, dynamic pyramid strategy is applied throughout the whole model. This strategy exploits multi-scale features under attention mechanism to maximally capture the most discriminative features and mitigate the impact of video data quality problems such as partial occlusion. Extensive experiments demonstrate the outperformance of DPRM. For instance, it achieves 83.0% mAP and 89.0% Rank-1 accuracy on MARS dataset and reaches state of the art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call