Visible-infrared person re-identification (VI-ReID) is a crucial cross-modal matching task in computer vision. Existing research typically relies on a single auxiliary modality to address the semantic gap in cross-modal data, but this approach fails to fully exploit the information inherent in both modalities. To tackle this issue, we propose a novel framework called Implicit Modality Knowledge Alignment (IMKA) and Uncertainty Estimation (UE). This framework aims to enhance the robustness of the learned common embedding space and improve the prediction accuracy of the VI-ReID model. The IMKA module is designed to extract valuable supplementary information from the original modality data to generate implicit modality data. The SKA module then aligns the distribution of significant knowledge learned from this generated implicit modality data. Meanwhile, the UE strategy is implemented to mitigate the overconfidence in incorrect predictions. The uncertainty of predicted probabilities is evaluated in a Dirichlet space, and the designed evidence loss bolsters the confidence in uncertainty. This uncertain information can enhance retrieval performance. Extensive experimental results on two publicly available VI-ReID datasets demonstrate that our IMKA-UE framework significantly outperforms state-of-the-art methods. The code for our IMKA-UE framework is available at https://github.com/SWU-CS-MediaLab/IMKA-UE.
Read full abstract