An Efficient Non-local Attention Network for Video-based Person Re-identification

Zhen Wang,Shixian Luo,Jun Yin,Huadong Pan,He Sun

doi:10.1145/3377170.3377253

Abstract

A spatial and temporal attention strategy based on Non-local Networks is proposed for video-based person re-identification. The most existing methods design attention mechanisms on high-level features, which ignore the low-level features with more details. The proposed method adopts non-local networks which can aggregate features according to feature correlation at any level. There are two contributions of this work can be summarized as follows: (i) The spatial and temporal redundancy in video-based person Re-ID is analyzed in this work; (ii) An Efficient Non-local Attention Network is designed to reduce the computation complexity by exploring spatial and temporal redundancy for video-based person Re-ID. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMC-VideoReID. The experiments show that our model achieve 85.2% mAP, 88.3% rank-1 accuracy on MARS dataset and 95.4% mAP, 95.6% rank-1 on DukeMTMC-VideoReID without re-ranking, which significantly outperforms the state-of-arts.

Full Text