Abstract

Surveillance cameras often capture near infrared images since it provides a low-cost and effective solution to acquire high-quality images under low-light environments. However, visual versus near infrared (VIS-NIR) heterogeneous face recognition (HFR) is still a challenging issue in computer vision community due to the gap between sensing patterns of different spectrums as well as the lack of sufficient training samples. To solve the above problem, in this paper, we present an effective Disentangled Spectrum Variations Networks (DSVNs) for VIS-NIR HFR. Two key strategies are introduced to the DSVNs for disentangling spectrum variations between two domains: Spectrum-adversarial Discriminative Feature Learning (SaDFL) and Step-wise Spectrum Orthogonal Decomposition (SSOD). The SaDFL consists of Identity-Discriminative subnetwork (IDNet) and Auxiliary Spectrum Adversarial subnetwork (ASANet). On the one hand, the IDNet is composed of a generator $G_H$ and a discriminator $D_U$ for extracting identity-discriminative feature. On the other hand, the ASANet is built by a generator $G_H$ and a discriminator $D_M$ for eliminating modality-variant spectrum information under the guidance of the discriminator $D_M$ . The identity-label and modality-label HFR datasets are used to train the DSVNs with triplet loss. Both IDNet and ASANet can jointly enhance the domain-invariant feature representations via an adversarial learning. Furthermore, to disentangle spectrum variations effectively as well as making identity information and modality information unrelated to each other, we present a new topology of connection block called Disentangled Spectrum Variations (DSV). An orthogonality constraint is imposed to DSV at the convolution level for channel-wise orthogonal decomposition between the modality-invariant identity information and modality-variant spectrum information. In particular, the SSOD is built by stacking multiple modularized mirco-block DSV, and thereby enjoys the benefits of disentangling spectrum variation step by step. Moreover, we investigate the similarity calculation method to further improve the HFR performance. To sum up, the designed DSVNs leads to a purification of identity information as well as an elimination of modality information. Extensive experiments are carried out on two challenging NIR-VIS HFR datasets CASIA NIR-VIS 2.0 and Oulu-CASIA NIR-VIS, demonstrating the superiority of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call