Adversarial Disentanglement Spectrum Variations and Cross-Modality Attention Networks for NIR-VIS Face Recognition

Weipeng Hu,Haifeng Hu

doi:10.1109/tmm.2020.2980201

Abstract

Near-infrared and visual (NIR-VIS) matching task refers to the face recognition between the two images of different modalities, which remains a challenging task in the field of machine vision. The main problems of NIR-VIS Heterogeneous Face Recognition (HFR) tasks include two aspects: large intra-class differences caused by cross-modal data, and insufficient paired training samples. In this paper, an effective Adversarial Disentanglement spectrum variations and Cross-modality Attention Networks (ADCANs) is proposed for VIS-NIR matching task. Three key components are introduced to the ADCANs for reducing the gap of cross-modal images: Advanced Scatter Loss (ASL), Modality-adversarial Feature Learning (MaFL) and Cross-modality Attention Block (CmAB). The proposed ASL loss captures between- and within-class information of the data and embeds them to the network for more effective training, and it focuses on categories with small between-class distance and increases the distance between them. The MaFL consists of an Identity-Discriminative Feature Learning Network (IDFLN) and a Modality-Adversarial Disentanglement Network (MADN), which can enhance the identity-discriminative feature representations as well as disentangling spectrum variations via an adversarial learning. The IDFLN built by an end-to-end CNNs aims at learning identity-discriminative feature. While the MADN built by a discriminator $D$ and a generator $G$ focuses on removing modality-related information. Furthermore, to increase representation power as well as disentangling spectrum variations effectively, a CmAB block is introduced to the network, which sequentially applies spatial and channel attention modules to both the IDFLN and MADN. Since the channel attention module focuses on ‘what’ features to suppress or emphasize, an orthogonality constraint is introduced to the two channel attention modules, which allows MADN and IDFLN to focus on learning modality-related features and identity-related features, respectively. In particular, the ADCANs consists of multiple CmAB blocks to learn discriminative features and disentangle spectrum variations. A large number of experiments on three challenging HFR datasets indicate that the proposed ADCANs is effective for VIS-NIR HFR task.

Full Text