Abstract
Person re-identification (re-id) is a significant application in public security and attracts much more research interest due to its significant application in reality. Most person re-id models focus on image-based or video-based re-id problems. In fact, image-to-video person re-id has important significance in lost-human location, criminal-tracking, and pedestrian video retrieval. In image-to-video person re-id task, the key challenge of this issue is how to build an accurate connection between appearance image features and spatio-temporal video features due to the huge cross-media gap in different modalities. Although existing image-to-video person re-id models have achieved good effectiveness, there is still a large distance away from practical application. These methods only consider the similarity measurement of cross-media features, which are extracted from the original whole image/video without any importance. However, the main useful and discriminative information is always contained in human body parts (torso, elbow, wrist, knee, and ankle), while pedestrian image/video backgrounds retain lots of useless information. In this paper, we present a Cross-media Body-part Attention Network (CBAN) for image-to-video person re-id, which can extract the cross-media body part attention features from images/videos (by CNN/LSTM), and simultaneously ignore the useless information in the background by using a part attention mechanism. Besides, our network can alleviate the inherent cross-media gap by a novel media-pulling constraint term. The extensive experiments are conducted on three large scale datasets (Market1501, Mars and CUHK03) and two small datasets (PRID-2011, iLIDS-VID), and the results show our CBAN approach can solve the image-to-video person re-id problem effectively with a body-part attention mechanism.
Highlights
Person re-identification is one of the most important research in video surveillance and it aims at searching the correct pedestrian in gallery set through a probe image/video
The results demonstrate that Cross-media Body-part Attention Network (CBAN) approach is superior to the state-of-the-arts, which shows that body-part attention mechanism can effective improve the performance of person re-id problem
EXPERIMENTAL RESULTS We show the performance of our Cross-media Body-part Attention Network on five widely used benchmarks
Summary
Person re-identification (re-id) is one of the most important research in video surveillance and it aims at searching the correct pedestrian in gallery set through a probe image/video. From the application in different scenarios, the categories of person re-id models are separated as image-based [1], [7], [21], [46], video-based [6], [18], [30], and image-tovideo based [32], [43] person re-id according to the matching. The kind of image-based person re-id aims to recognize pedestrians between images captured from different cameras (denoted as probe and gallery) with a same person identity, and the kind of video-based re-id methods is utilized into searching the target person identity in gallery videos by a given probe video or image-set. In criminal tracking and lost human locating, image-to-video person re-id has been widely used. In this application, more noisy images without efficient information can not support image-based person re-id methods. It has very important practical significance to exploit image-to-video person re-id
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.