Abstract
Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.
Highlights
Person re-identification (ReID) is one of the most important and popular fields of computer vision, with tremendous application potential in video surveillance [1]–[5]
Given a video sequence I = {I1, I2, . . . , IK } with K frames, it can be observed that our framework is divided into four parts, including the global feature learning in the video, representative frame extraction for the video, local feature learning in the video and overall feature representation learning
Considering that a query has multiple ground truths, we evaluate the performance on MARS with the average cumulative match characteristic (CMC) curve and the mean average precision
Summary
Person re-identification (ReID) is one of the most important and popular fields of computer vision, with tremendous application potential in video surveillance [1]–[5]. Most studies of ReID have sprung up over the past few years. They can be divided into two classes, design-. The associate editor coordinating the review of this article and approving it for publication was Yongming Li. ing a robust metric learning method or developing a discriminative feature. The metric research focus on learning the distance function [6]–[9]. It helps to ensure that the distance between two features came from the same pedestrian is smaller. The feature learning aims at building an effective and discriminative representation to describe pedestrians. For the hand-crafted features, the low-level descriptors, such as color and texture histograms, are widely used in the ReID task. With the extensive application of the convolutional neural network (CNN) in the visual classification task, some work regards ReID as a multi-class
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.