Visible–infrared person re-identification (VI-ReID) is a challenging retrieval task, which aims to match the same pedestrian between visible and infrared modalities. Most existing works achieve performance gains by solving the problem of the inherent cross-modality discrepancies. However, they cannot fully mine the modality information and lead to a poor generalization. In addition, the pedestrian images are unable to align well due to the large inter- and intra- class variations. To tackle the above limitations, we propose a novel dual-modality alignment network (DMANet) for VI-ReID. The core idea of our work is to develop multi-granularity features mutual learning (MGFML) for inadequate perception of modalities information, and to solve modality difference by proposing inter- and intra- modality alignment module (IIMA). Specifically, firstly, an effective multi-granularity features mutual learning module is proposed to mine the multi-granularity features, which combines the domain alignment and self-distillation to relieve modality discrepancy. Further, the maximum mean discrepancy loss and mutual learning loss are presented to enhance the identity-aware ability of the DMANet. Secondly, an effective inter- and intra- modality alignment module is presented to explore the potential alignment relation of inter- and intra- modalities. Finally, joint learning mechanism of multi-granularity features and modality alignment is utilized to improve the VI-ReID accuracy. Extensive experiments on mainstream benchmarks demonstrate that our method is superior to the state-of-the-art methods.
Read full abstract