Abstract
Visible Thermal Person Re-Identification (VTReID) is essentially a cross-modality problem and widely encountered in real night-time surveillance scenarios, which is still in need of vigorous performance improvement. In this work, we design a simple but effective Hard Modality Alignment Network (HMAN) framework to learn modality-robust features. Since current VTReID works do not consider the cross-modality discrepancy imbalance, their models are likely to suffer from the selective alignment behavior. To solve this problem, we propose a novel Hard Modality Alignment (HMA) loss to simultaneously balance and reduce the modality discrepancies. Specifically, we mine the hard feature subspace with large modality discrepancies and abandon the easy feature subspace with small modality discrepancies to make the modality distributions more distinguishable. For mitigating the discrepancy imbalance, we pay more attention on reducing the modality discrepancies of the hard feature subspace than that of the easy feature subspace. Furthermore, we propose to jointly relieve the modality heterogeneity of global and local visual semantics to further boost the cross-modality retrieval performance. This paper experimentally demonstrates the effectiveness of the proposed method, achieving superior performance over the state-of-the-art methods on RegDB and SYSU-MM01 datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.