Abstract

Given a person of interest in RGB images, Visible-Infrared Person Re-identification (VI-REID) aims at searching for this person in infrared images. It faces a number of challenges due to large cross-modality discrepancies and intra-modality variations caused by illuminations, human poses, viewpoints and cluttered backgrounds, etc. This paper proposes a Mask-guided Dual Attention-aware Network (MDAN) for VI-REID. MDAN consists of two individual networks for two different modalities respectively, whose feature representations are driven by mask-guided attention-aware information and multi-loss constraints. Specifically, we first utilize masked image as a supplement to the original image, so as to enhance the contour and appearance information which are extremely important clues for matching the features of pedestrians from visible and infrared modalities. Second, a Residual Attention Module (RAM) is put forward to capture fine-grained features and subtle differences among pedestrians, so as to learn more discriminative features of pedestrians from heterogeneous modalities by adaptively calibrating feature responses along channel and spatial dimensions. Third, features from two individual streams of two modalities will be directly aggregated to form a cross-modality identity representation. Extensive experiments demonstrate that the proposed approach effectively improves the performance of VI-REID task and remarkably outperforms the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call