Fine-Grained Fusion With Distractor Suppression for Video-Based Person Re-Identification

Jiali Xi,Shibao Zheng,Qin Zhou,Yiru Zhao

doi:10.1109/access.2019.2932102

Jiali Xi, Shibao Zheng + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2932102

Copy DOI

Abstract

Video based person re-identification aims to associate video clips with the same identity by designing discriminative and representative features. Existing approaches simply compute representations for video clips via frame-level or region-level feature aggregation, where fine-grained local information is inaccessible. To address this issue, we propose a novel module called fine-grained fusion with distractor suppression (short as FFDS) to fully exploit the local features towards better representation of a specific video clip. Concretely, in the proposed FFDS module, the importance of each local feature of an anchor image is calculated by pixel-wise correlation mining with other intra-sequence frames. In this way, 'good' local features co-exist across the video frames are enhanced in the attention map, while sparse 'distractors' can be suppressed. Moreover, to maintain the high-level semantic information of deep CNN features as well as enjoying the fine-grained local information, we adopt the feature mimicking scheme during the training process. Extensive experiments on two challenging large-scale datasets demonstrate effectiveness of the proposed method.

Highlights

The aim of person re-identification (Re-ID) is to associate images with the same identity across different camera views
We propose a novel module called fine-grained fusion with distractor suppression (FFDS) to fully exploit the detailed information of local features
We propose the fine-grained fusion with distractor suppression (FFDS) module to alleviate the influence of distractors while enhance the fine-grained local features that co-exist across the frames of the video clip

Summary

Introduction

The aim of person re-identification (Re-ID) is to associate images with the same identity across different camera views. With promising applications in smart video surveillance, human-computer interaction etc., great efforts have been devoted to this field. Person re-identification is far from being solved. The main challenges come from large variations in human poses, camera views, illumination changes and similar dressing. Videos contain richer information and are more in line with practical scenarios. Video based person re-identification has evolved as an important branch in the Re-ID community. Inspired by the attention mechanism, the authors [1] proposed to jointly discover a diverse set of distinctive body parts and assign them with different weights across different frames. In [2], intra and inter sequence attentions are utilized

Objectives

Methods

Results

Conclusion