Video-based person re-identification using a novel feature extraction and fusion technique

Wanru Song,Feng Liu,Yahong Wu,Jieying Zheng,Changhong Chen

doi:10.1007/s11042-019-08432-0

Abstract

Person re-identification has received extensive attention in the academic community. In this paper, a novel multiple feature fusion network (MPFF-Net) is proposed for video-based person re-identification. The proposed network is used to obtain the robust and discriminative feature representation for describing the pedestrian in the video, which contains the hand-crafted and deep-learned parts. First, the image-level features of all consecutive frames are extracted. Then the hand-crafted branch uses these descriptors to obtain the average feature of the video and the information of frame-to-frame differences. The deep-learned branch is based on the bidirectional LSTM (BiLSTM) network. It is responsible for aggregating frame-wise representations of human regions and yielding sequence-level features. Furthermore, the problem of misalignment is taken into account in this branch. Finally, the hand-crafted and deep-learned parts are considered to be complementary, and the fusion of them can help to capture the complete information of the video. Extensive experiments are conducted on the iLIDS-VID, PRID2011 and MARS datasets. The results demonstrate that the proposed algorithm outperforms state-of-the-art video-based re-identification methods.

Full Text