Abstract

A spatial and temporal attention strategy based on Non-local Networks is proposed for video-based person re-identification. The most existing methods design attention mechanisms on high-level features, which ignore the low-level features with more details. The proposed method adopts non-local networks which can aggregate features according to feature correlation at any level. There are two contributions of this work can be summarized as follows: (i) The spatial and temporal redundancy in video-based person Re-ID is analyzed in this work; (ii) An Efficient Non-local Attention Network is designed to reduce the computation complexity by exploring spatial and temporal redundancy for video-based person Re-ID. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMC-VideoReID. The experiments show that our model achieve 85.2% mAP, 88.3% rank-1 accuracy on MARS dataset and 95.4% mAP, 95.6% rank-1 on DukeMTMC-VideoReID without re-ranking, which significantly outperforms the state-of-arts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call