Abstract

Despite its promising preliminary results, existing cross-modality Visible-Infrared Person Re-IDentification (VI-PReID) models incorporating semantic (person) masks simply use these person masks as selection maps to separate person features from background regions. Such models do not dedicate to extracting more modality-invariant person body features in the VI-PReID network itself, thus leading to suboptimal results in VI-PReID. Differently, we aim to better capture person body information in the VI-PReID network itself for VI-PReID by exploiting the inner relations between person mask prediction and VI-PReID. To this end, a novel multi-task learning model is presented in this paper, where person body features obtained by person mask prediction potentially facilitate the extraction of discriminative modality-shared person body information for VI-PReID. On top of that, considering the task difference between person mask prediction and VI-PReID, we propose a novel task translation sub-network to transfer discriminative person body information, extracted by person mask prediction, into VI-PReID. Doing so enables our model to better exploit discriminative and modality-invariant person body information. Thanks to more discriminative modality-shared features, our method outperforms previous state-of-the-arts by a significant margin on several benchmark datasets. Our intriguing findings validate the effectiveness of extracting discriminative person body features for the VI-PReID task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call