In this paper, we propose a Cross-Modal Face Super-Resolution (CMFSR) method to construct high-resolution (HR) facial images from low-resolution (LR) cross-modal facial images captured respectively by disjoint visible light (VIS) and near-infrared (NIR) cameras. Due to the coupling of modality transformation and information fusion, CMFSR is more difficult to obtain HR reconstructed results compared with traditional super-resolution. To solve this problem, a Quasi-Siamese Domain Transfer Fusion Network (QSDTFN) for CMFSR is proposed in this paper, whose two branches transfer two LR face modality to HR face modality by domain transfer respectively. Different from two completely independent branches in the traditional pseudo-siamese network, only the HR-to-LR face transfer processes of the two branches in our quasi-siamese network are independent, while the LR-to-HR face transfer processes are coupled. This coupled module called the Adaptive Weighted Domain Transfer Fusion Module (AWDTFM) disentangles the modality and identity information in the two LR faces, thus achieving modality transformation and identity information fusion simultaneously. In order to strengthen the optimization on the process of CMFSR, this method further introduces the backward QSDTFN to form a higher-level bidirectional structure with the forward QSDTFN, and specifically designs two types of losses: intra-network loss and inter-network loss, to constrain the modality and identity consistencies within one QSDTFN and between two QSDTFNs respectively. The experimental results on the challenging LR cross-modal face datasets demonstrate that the proposed method performs favorably against the state-of-the-art methods.
Read full abstract