Abstract

Most existing near-infrared to visible (NIR-VIS) face recognition (FR) methods rely on global feature representations to reduce cross-modality discrepancies, but ignore the structural relationships between local features, e.g., the relative positions of eyes, nose, and mouth. Precise alignment of these local features can enhance the learning of modality-invariant face representations, thereby improving the performance of NIR-VIS FR. Therefore, in this letter, we propose an intrinsic structured graph alignment (ISGA) module that aims to obtain a graph-level local feature alignment across modalities. To this end, we first construct an intrinsic structure graph to model the inherent structural relationships of local features, and then enhance the discriminative feature representation by aligning the graphs between modalities. To jointly encourage cross-modality class consistency between semantics and structural relationships, a cross-modality class distribution (CMCD) loss is proposed to add an identity-preserving constraint to each class distribution in the embedding space between two modalities. To solve the resulting problem of suppressed class divisibility, we maximize the mutual information between inputs and class predictions. Extensive experiments on challenging NIR-VIS datasets indicate that our approach outperforms the state-of-the-arts. The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/JianYu777/ISGA-CMCD</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call