Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature.

Rui Sun,Jun Zhang,Qiheng Huang,Miaomiao Xia

doi:10.3390/s18113669

Abstract

Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person’s temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.

Highlights

Person re-identification aims at matching a target person across non-overlapping cameras at different times or different locations
We propose a Two-branch Appearance Feature (TAF) sub-structure consisting of the walking cycle model, the two-branch Inception-V3 network, and the saliency learning fusion layer, which is used to learn the global and local appearance features of persons
For the PRID-2011dataset, we compared the performance of our proposed architecture with eleven state-of-the-art methods, including discriminative selection and ranking (DVR) [2], DVDL [42], STFV3D [3], RMLLC-SLF [43], TDL [4], RFA [44], convolutional neural network (CNN)-recurrent neural network (RNN) [5], CNN-BRNN [9], CRF [10], ASTPN [22], TSSCN [11], and TAM-SRM [45]

Summary

Introduction

Person re-identification (person Re-ID) aims at matching a target person across non-overlapping cameras at different times or different locations. It has important significance in video surveillance systems and the public security field, but is a crucial challenge in the field of multi-camera visual sensor networks [1]. Because multi-camera visual sensor networks capture the video clip of the target person, research on video-based person re-identification is necessary and inevitable for public safety. Video-based person re-identification is the task of utilizing a sequence of images/tracklets to match the person. An increasing number of exiting research works [2,3,4,5] focus on video-based person re-identification. As the probe video and gallery videos are taken from different cameras, they may suffer from inherent challenges such as lighting variations, Sensors 2018, 18, 3669; doi:10.3390/s18113669 www.mdpi.com/journal/sensors

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Oct 29, 2018
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Hierarchical Parsing Net: Semantic Scene Parsing From Global Scene to Objects
Hengcan Shi ... Hongliang Li
IEEE Transactions on Multimedia | VOL. 20
Hengcan Shi, et. al.Hengcan Shi ... Hongliang Li
01 Oct 2018
IEEE Transactions on Multimedia | VOL. 20

Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
Abhishek Aich ... Meng Zheng
-
Abhishek Aich, et. al.Abhishek Aich ... Meng Zheng
01 Oct 2021
01 Oct 2021

Video-Based Person Re-identification by Deep Feature Guided Pooling
Youjiao Li ... Qi Tian
-
Youjiao Li, et. al.Youjiao Li ... Qi Tian
01 Jul 2017
01 Jul 2017

A New Method of Image Classification Based on Local Appearance and Context Information
Yuhua Fan ... Shiyin Qin
-
Yuhua Fan, et. al.Yuhua Fan ... Shiyin Qin
01 Aug 2011
01 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors