Near-infrared and visual (NIR-VIS) face matching, as the most typical task in Heterogeneous Face Recognition (HFR), has attracted increasing attention in recent years. However, due to the large within-class discrepancies, including domain differences and residual discrepancies (i.e., lighting, expressions, occlusion, blurry, pose, etc), this is still a difficult task. Conventional NIR-VIS FR methods only focus on reducing the modality gap between cross-domain images, while neglecting to eliminate the residual variations. To better solve the above problems, this paper proposes a novel Orthogonal Modality Disentanglement and Representation Alignment (OMDRA) approach, which consists of three key components, including Modality-Invariant (MI) loss, Orthogonal Modality Disentanglement (OMD) and Deep Representation Alignment (DRA). Firstly, the MI loss is designed to learn modality-invariant and identity-discriminative representation, by increasing between-class separability and within-class compactness between NIR and VIS heterogeneous data. Secondly, the high-level Hybrid Facial Feature (HFF) layer of the backbone network is projected into two subspaces: the modality-related and identity-related subspaces. The OMD is designed to decouple modal information via an adversarial process, and we further impose Orthogonal Representation Decorrelation (ORD) to the OMD to decrease the correlation between identity representations and domain representations, as well as enhancing their representation capabilities. Finally, the DRA aims to eliminate the residual variations by performing a high-level representation alignment between non-neutral face and neutral face, which can effectively guides the network to learn discriminative and residual-invariant face representation. The joint scheme enables the disentanglement of modality variations, elimination of residual discrepancies, and the purification of identity information. Extensive experiments on challenging cross-domain databases indicate that our OMDRA method is superior to the state-of-the-art methods.
Read full abstract