In this article, we address the face image translation task, which aims to translate a face image of a source domain to a target domain. Although significant progress has been made by recent studies, face image translation is still a challenging task because it has more strict requirements for texture details: even a few artifacts will greatly affect the impression of generated face images. Targeting to synthesize high-quality face images with admirable visual appearance, we revisit the coarse-to-fine strategy and propose a novel p arallel m ultistage architecture on the basis of g enerative a dversarial n etworks (PMSGAN). More specifically, PMSGAN progressively learns the translation function by disintegrating the general synthesis process into multiple parallel stages that take images with gradually decreasing spatial resolution as inputs. To prompt the information exchange between various stages, a cross-stage atrous spatial pyramid (CSASP) structure is specially designed to receive and fuse the contextual information from other stages. At the end of the parallel model, we introduce a novel attention-based module that leverages multistage decoded outputs as in situ supervised attention to refine the final activations and yield the target image. Extensive experiments on several face image translation benchmarks show that PMSGAN performs considerably better than state-of-the-art approaches.
Read full abstract