SUNIT: multimodal unsupervised image-to-image translation with shared encoder

Liyuan Lin,Shun Zhang,Shulin Ji,Yuan Zhou

doi:10.1117/1.jei.31.1.013033

Abstract

Image-to-image translation that refers to the synthesis task of transferring images from the source domain to the target domain has gained significant progress in recent times. Multimodal image-to-image translation aims to generate images with multiple styles of target domains. However, the existing multimodal image-to-image translation network architectures are incapable of accurately transferring the style of a specified image. Moreover, they require an additional deep encoder network to extract the image style code, which increases the network parameters. To address this problem, we propose Sunit, a multimodal unsupervised image-to-image translation with a shared encoder. Sunit shares an encoder network between the discriminator and style encoder. This method reduces the number of network parameters and uses the information from the discriminator to extract the style. Furthermore, we design a training strategy in which the style encoder solely uses the style reconstruction loss and does not follow the generator to train. In this manner, the target of the style encoder becomes clearer. Finally, extensive experimental validations are carried out on the AFHQ and Celeb-HQ datasets. The results demonstrate that our approach outperforms the state-of-the-art methods in the task of reference-guided image translation and transfers the style of the specified image more accurately.

Full Text