Characteristic Regularisation for Super-Resolving Face Images

Zhiyi Cheng,Shaogang Gong,Xiatian Zhu

doi:10.1109/wacv45572.2020.9093480

Abstract

Existing facial image super-resolution (SR) methods focus mostly on improving artificially down-sampled lowresolution (LR) imagery. Such SR models, although strong at handling artificial LR images, often suffer from significant performance drop on genuine LR test data. Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data as well as cycle consistency loss formulation. However, this renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution. Importantly, this makes the end-to-end model training ineffective due to the difficulty of back-propagating gradients through two concatenated CNNs. To solve this problem, we formulate a method that joins the advantages of conventional SR and UDA models. Specifically, we separate and control the optimisations for characteristics consistifying and image super-resolving by introducing Characteristic Regularisation (CR) between them. This task split makes the model training more effective and computationally tractable. Extensive evaluations demonstrate the performance superiority of our method over state-of-the-art SR and UDA models on both genuine and artificial LR facial imagery data.

Highlights

Facial image analysis [32, 6, 2] is significant for many computer vision applications in business, law enforcement, and public security [26]
With this paradigm, existing supervised deep learning models (e.g. CNNs) can be readily applied. This is at a price of poor model generalisation to real-world genuine LR facial images, e.g. surveillance imagery captured in poor circumstances
This is because genuine LR data have rather different imaging characteristics from artificial LR images, often coming with additional unconstrained motion blur, noise, corruption, and image compression artefacts. (Fig. 3). This causes the distribution discrepancy between training data and test data which attributes to poor model generalisation, known as the domain shift problem [26]

Summary

Introduction

Facial image analysis [32, 6, 2] is significant for many computer vision applications in business, law enforcement, and public security [26]. Existing state-of-the-art image SR models [8, 37, 41] mostly learn the low-to-high resolution mapping from paired artificial LR and HR images. The artificial LR images are usually generated by down-sampling the HR counterparts (Fig 1(a)) With this paradigm, existing supervised deep learning models (e.g. CNNs) can be readily applied. This is at a price of poor model generalisation to real-world genuine LR facial images, e.g. surveillance imagery captured in poor circumstances. This is because genuine LR data have rather different imaging characteristics from artificial LR images, often coming with additional unconstrained motion blur, noise, corruption, and image compression artefacts. This causes the distribution discrepancy between training data (artificial LR imagery) and test data (genuine LR imagery) which attributes to poor model generalisation, known as the domain shift problem [26]

Objectives

Methods

Results