Abstract

Video based person re-identification aims to associate video clips with the same identity by designing discriminative and representative features. Existing approaches simply compute representations for video clips via frame-level or region-level feature aggregation, where fine-grained local information is inaccessible. To address this issue, we propose a novel module called fine-grained fusion with distractor suppression (short as FFDS) to fully exploit the local features towards better representation of a specific video clip. Concretely, in the proposed FFDS module, the importance of each local feature of an anchor image is calculated by pixel-wise correlation mining with other intra-sequence frames. In this way, 'good' local features co-exist across the video frames are enhanced in the attention map, while sparse 'distractors' can be suppressed. Moreover, to maintain the high-level semantic information of deep CNN features as well as enjoying the fine-grained local information, we adopt the feature mimicking scheme during the training process. Extensive experiments on two challenging large-scale datasets demonstrate effectiveness of the proposed method.

Highlights

  • The aim of person re-identification (Re-ID) is to associate images with the same identity across different camera views

  • We propose a novel module called fine-grained fusion with distractor suppression (FFDS) to fully exploit the detailed information of local features

  • We propose the fine-grained fusion with distractor suppression (FFDS) module to alleviate the influence of distractors while enhance the fine-grained local features that co-exist across the frames of the video clip

Read more

Summary

Introduction

The aim of person re-identification (Re-ID) is to associate images with the same identity across different camera views. With promising applications in smart video surveillance, human-computer interaction etc., great efforts have been devoted to this field. Person re-identification is far from being solved. The main challenges come from large variations in human poses, camera views, illumination changes and similar dressing. Videos contain richer information and are more in line with practical scenarios. Video based person re-identification has evolved as an important branch in the Re-ID community. Inspired by the attention mechanism, the authors [1] proposed to jointly discover a diverse set of distinctive body parts and assign them with different weights across different frames. In [2], intra and inter sequence attentions are utilized

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call