Abstract

At present, regarding the task of video-based person re-identification, the input sequences have subtle differences and large redundancies because there are not enough effective interventions in the extraction of frame sequences. Although some studies have mentioned that key frame should be extracted first, they have not jointed the key frame extraction and the person re-identification. Consequently, it is difficult to evaluate whether the extracted key frames are effective for person re-identification. In this paper, we introduce an End-to-end Network Embedding Unsupervised Key Frame Extraction (EKEN) to address the above problems. First, we design a key frame extraction module and train it using pseudo labels generated by hierarchical clustering to extract key frames. Second, we embed the key frame extraction module into the person re-identification task. The results of the key frame extraction and the pedestrian re-recognition are fed back to each other in time. The instant feedback promotes the synchronization optimization of these two modules. The mAP achieved by our method in the MARS dataset is improved by 0.7%, 2.9%, 2.1% and 2.3% over the methods based on Random, Evenly, Cluster and Frame difference, respectively. Particularly, our method is more fit for the real-world application comparing to existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call