Person Re-Identification Based on Two-Stream Network With Attention and Pose Features

Xiaowei Gong,Suguo Zhu

doi:10.1109/access.2019.2935116

Xiaowei Gong, Suguo Zhu

Open Access

https://doi.org/10.1109/access.2019.2935116

Copy DOI

Abstract

Due to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attention with two-stream network. Our proposed method mainly consists of two parts. 1) Spatial Features with Fusion Multi-Layer Features and Attention: the same pedestrian presents different gestures under different camera angles, indicating that the simple spatial information is no longer reliable. Therefore, it becomes important to distinguish view invariant features from multiple semantic levels. As a consequence, we fusion the mid-level and high-level features, and then correlate global information through self-attention. Due to fusion the mid-level and high-level features, semantic information is more abundant, which enables the attention mechanism to better focus on the important areas of the picture; 2) Aggregation Attention Stream and Pose Estimation Stream Features: although self-attention mechanism can automatically pay attention to the important areas of the image, it may pay too much focus on the prominent parts of the body and ignore the edge information of the body. Hence, the guidance of pedestrian posture is needed to make self-attention better able to pay attention to all parts of the body. Finally, we use bilinear pooling aggregates the features of two-stream as the final features. We do not use any data enhancement and re-ranking methods to achieve the $rank=1$ accuracy of 93.3% and 85.5% in Market1501 and DukeMTMC-reID datasets, respectively, which indicates the effectiveness of our method.

Highlights

Person re-identification(Re-ID) is a technology that use computer vision technology to judge whether there is a specific pedestrian in the image or video sequence
The reason may be that the Market1501 dataset is collect in summer, with obvious gender characteristics, and the gap between different classes is relatively large, while the DukeMTMC-reID dataset is collect in winter, and the gender difference is not obvious, and the gap between different classes is relatively small
Through a series of experiments and comparisons, we found that the attention mechanism can automatically focus on the prominent areas of the human body, but may ignore some edge parts of the human body, and pose can effectively guide attention to the right parts of the body, so we combine pose and attention through a two-stream network

Summary

Introduction

Person re-identification(Re-ID) is a technology that use computer vision technology to judge whether there is a specific pedestrian in the image or video sequence. Person re-identification is widely considered as a sub-problem of image retrieval. Given a pedestrian image under a camera, the retrieval of the pedestrian image under a cross-camera is performed. Person re-identification aims to make up for the visual limitations of the current fixed cameras, and can be combined with pedestrian detection/tracking technology, which can be widely used in video monitoring security and other fields. With the continuous development of deep learning, the related work of person re-identification task using deep learning method is emerging one after another.

Objectives

Methods

Results

Conclusion