Person Re-Identification (ReID) is an essential technology for matching a person across non-overlapping cameras. It has attracted increasing attention in recent years due to its wide range of applications in various real-world scenarios such as security surveillance and criminal investigation. Different from other person ReID tasks, video-based ReID uses a video clip as the retrieval input, which can provide more promising ReID performance because that the video has rich information on appearance, motion cues and pose variations on temporal pipeline. Over the last few years, many deep learning-based video person ReID have been proposed to address various challenges, such as illumination variation, complex background, occlusion, etc. To provide a more comprehensive and readable review on existing video-based person ReID methods, we propose a novel taxonomy method that observes existing methods from four perspectives: data, algorithms, computing power, and applications. Specifically, we first introduce some popular datasets and evaluation criterion used for video-based person ReID. Next, from limited data and little annotation view, we introduce data augmentation and unsupervised learning ReID. From algorithm view, we focus on reviewing supervised methods including spatial feature learning, temporal feature learning and spatio-temporal feature learning, and further discuss and conduct a systematic comparison among these approaches. From complex open-world application view, we mainly summarized domain adaption and multimodal ReID. From insufficient GPU computing power view, we mainly discuss modality-agnostic unified large-scale ReID and their lightweighting. Finally, we provide a discussion of open problems and potential research directions for the community.
Read full abstract