Abstract

Siamese networks are prevalent in person re-identification (re-id) tasks to address the similarity and dissimilarity among video frames. It mainly focuses on the inter-video variation between spatio-temporal features extracted from different videos, while the variation between features of the same video has been rarely discussed. In this paper, we introduce the concept of “mean-body” and define an intra-video loss to address the variation between spatio-temporal features of the same video. A novel loss is presented to boost the training of the re-id networks by combining the proposed intra-video loss and the Siamese loss. Specifically, the intra-video loss uses the unique mean-body of each camera viewpoint to make the video sequence more clustered, while the Siamese loss is to make the wrong matching videos more separated. To train the whole network, we update the network and the mean-body in an iterative manner. As a result, the proposed loss is expected to improve the generalization capability of the re-id networks on the testing set. Extensive results demonstrate that the presented approach outperforms the state-of-the-art algorithms on the publicly available data sets, such as PRID2011, iLIDS-VID, and MARS, in terms of re-id accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.