Abstract
RGB-infrared (RGB-IR) person reidentification is a challenge problem in computer vision due to the large crossmodality difference between RGB and IR images. Most traditional methods only carry out feature alignment, which ignores the uniqueness of modality differences and is difficult to eliminate the huge differences between RGB and IR. In this paper, a novel AGF network is proposed for RGB-IR re-ID task, which is based on the idea of global and local alignment. The AGF network distinguishes pedestrians in different modalities globally by combining pixel alignment and feature alignment and highlights more structure information of person locally by weighting channels with SE-ResNet-50, which has achieved ideal results. It consists of three modules, including alignGAN module ( A ), crossmodality paired-images generation module ( G ), and feature alignment module ( F ). First, at pixel level, the RGB images are converted into IR images through the pixel alignment strategy to directly reduce the crossmodality difference between RGB and IR images. Second, at feature level, crossmodality paired images are generated by exchanging the modality-specific features of RGB and IR images to perform global set-level and fine-grained instance-level alignment. Finally, the SE-ResNet-50 network is used to replace the commonly used ResNet-50 network. By automatically learning the importance of different channel features, it strengthens the ability of the network to extract more fine-grained structural information of person crossmodalities. Extensive experimental results conducted on SYSU-MM01 dataset demonstrate that the proposed method favorably outperforms state-of-the-art methods. In addition, we evaluate the performance of the proposed method on a stronger baseline, and the evaluation results show that a RGB-IR re-ID method will show better performance on a stronger baseline.
Highlights
Person reidentification is a process of retrieving the same target person from multiple different camera perspectives
After [29], the results of SYSUMM01 are evaluated with the official code, which was based on the average of 10 times repeated random split of gallery and probe set
The RGB images are converted into IR images by using the AlignGAN model, which reduces the crossmodality difference
Summary
Person reidentification (re-ID) is a process of retrieving the same target person from multiple different camera perspectives. It is widely used in video surveillance, security, and intelligent city applications and is an important problem in video surveillance. ReID has attracted more and more attention in computer vision [1,2,3,4,5,6]. Re-ID depends on good lighting conditions, which will not always be satisfied in real word. In night or dark environment, the visible cameras cannot capture effective appearance. Most surveillance cameras can automatically switch from visible (RGB) mode to near infrared (IR) mode, which provides the possibility to study the RGB-IR crossmodality matching problems in real scenes
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have