Unsupervised domain adaption for image-to-video person re-identification

Xinyu Zhang,Xiao-Yuan Jing,Fei Ma,Sen Li,Chen Zhu

doi:10.1007/s11042-019-08550-9

Abstract

Recently, person re-identification technique has been successfully applied to many fields, such as suspect tracking and lost human location. As video always contains more valuable information, more and more researchers focus on video based person re-identification, especially in image-to-video person re-identification (IVPR). However, most of existing IVPR models are under the supervised framework. In fact, marking enough training samples will cost numbers of labors, which limits the practical value of them. At the same time, the 2D features extracted from pedestrian image and 3D features extracted from pedestrian video are heterogeneous, which brings significant challenge for IVPR task. To effective solve the above problems, we propose an unsupervised domain adaption image-to-video person re-identification model by cross-modal feature generating and target information preserving transfer network (CMGTN). On one hand, the designed generator in our model can not only transform target domain unlabeled sample features into source domain feature space, but also can preserve target identity information. On the other hand, we eliminate the gap between pedestrian images and videos by embedding a cross-modal loss term. To evaluate the performance of our approach, we conduct extensive experiments on PRID-2011, iLIDS-VID and MARS datasets, and compare our approach with existing state-of-the-art IVPR models including four unsupervised methods and three supervised methods. Experimental results demonstrate the effectiveness of our approach.

Full Text