Person Re-identification is a crucial task in video surveillance, aiming to match person images from non-overlapping camera views. Recent methods introduce the Near-Infrared (NI) modality to alleviate the limitations of traditional single visible light modality under low-light conditions, while they overlook the importance of modality-related information. To incorporate more additional complementary information to assist traditional person re-identification tasks, in this paper, a novel RGB-NI-TI multi-modal person re-identification approach is proposed. First, we design a multi-scale multi-modal interaction module to facilitate cross-modal information fusion across multiple scales. Secondly, we propose a low-rank multi-modal fusion module that leverages the feature and weight parallel decomposition and then employs low-rank modality-specific factors for multimodal fusion. It aims to make the model more efficient in fusing multiple modal features while reducing complexity. Finally, we propose a multiple modalities prototype loss to supervise the network jointly with the cross-entropy loss, enforcing the network to learn modality-specific information by improving the intra-class cross-modality similarity and expanding the inter-class difference. The experimental results on benchmark multi-modal Re-ID datasets (RGBNT201, RGBNT100, MSVR310) and constructed person Re-ID datasets (multimodal version Market1501, PRW) validate the effectiveness of the proposed approach compared with the state-of-the-art methods.
Read full abstract