As the most important topic in Heterogeneous Face Recognition (HFR), Near-InfraRed and VISual (NIR-VIS) face recognition has attracted increasing research attentions owing to its potential application in the field of criminal detective cases and multimedia information retrieval. However, due to its dramatic intra-class variations, including modality, pose, occlusion, blurry, lighting, distance, expression, etc, it is very challenging to retain inherent identity information. To address the above issue, we propose a novel Dual Face Alignment Learning (DFAL) algorithm to explore the potential domain-invariant neutral face representations of the cross-modal images. Our model contains three effective components including Feature-level Face Alignment (FFA), Image-level Face Alignment (IFA) and Cross-domain compact Representation (CdR). Firstly, Teacher-Encoder CNNs (TeEn-CNNs) and Student-Encoder CNNs (StEn-CNNs) are designed to encode features for VIS neutral face images and non-neutral face images, and the FFA is introduced to learn neutral face representations by performing feature-level alignment between non-neutral face and VIS neutral face. Secondly, Student-Decoder CNNs (StDe-CNNs) is developed to decode features to restore face images, and the IFA is designed to reconstruct neutral face image by imposing image-level alignment. Notably, the FFA acts as the primary target to learn VIS neutral face representations for cross-view data, while the IFA plays a role in the icing on the cake, i.e., further disentangling domain and residual information through the synthesis process. Finally, the CdR dispels modality features and distills identity features by mining inter-class information, inter-domain information and inter-semantic relationship. The joint scheme enables the elimination of intra-class variations and the purification of identity information. We carry out comprehensive experiments to illustrate the effectiveness of the DFAL approach on three challenging NIR-VIS databases.
Read full abstract