Abstract

Unsupervised cross-domain image-to-image translation is a very active topic in computer vision and graphics. This task has two challenges: 1) lack of paired training data and 2) numerous possible outputs from a single image. The existing methods rely on either paired data or perform one-to-one translation. A novel Multi-Style Unsupervised image synthesis model using Generative Adversarial Nets (MSU-GAN) is proposed in this paper to overcome these disadvantages. Firstly, the encoder-decoder structure is used to map the image to domain-shared content features space and domain-specific style features space. Secondly, to translate an image into another domain, the content code and the style code are combined to synthesize the resulting image. Finally, the bidirectional cycle-consistency loss is used for the unpaired training data; the inter-domain adversarial loss and the reconstruction loss are used to ensure the output image's realism. Simultaneously, MSU-GAN is able to synthesize multi-style images due to disentangled representation. A Multi-Style Unsupervised Feature-Wise image synthesis model using Generative Adversarial Nets (MSU-FW-GAN) based on the MSU-GAN is proposed for the shape variation tasks. There are two different testing strategies, which include random style transfer and style guide transfer. For objective comparison, the proposed model performs well on all evaluation metrics. The random style transfer experiment results show that compared with CycleGAN on the photo2portraits dataset, MSU-FW-GAN FID, IS scores dropped by 12.77% and 8.06%. For the summer2winter dataset, MSU-GAN FID and IS scores increased by 24.51% and 3.64%. Qualitative results show that without paired training data, MSU-GAN and MSU-FW-GAN can synthesize multi-style and better realistic images on various tasks.

Highlights

  • Many computer vision and graphics problems aim to learn the cross-domain image-to-image translation between two or more domains

  • One development that will aid in tackling the issues mentioned above is to develop more sophisticated models which will simplify the task of multi-style image synthesis using generative adversarial networks

  • In this paper, a general and effective framework is proposed for multi-style unsupervised image-to-image translation

Read more

Summary

INTRODUCTION

Many computer vision and graphics problems aim to learn the cross-domain image-to-image translation between two or more domains. Such as colorization [1]–[3], super-resolution [4], [5], image synthesis [1], [6], style transfer [7]–[9], impainting [10] and domain adaptation [11]. Image-to-image (I2I) translation is challenging for two reasons: 1) lack of labeled or paired data, and 2) many I2I translation tasks want to be multi-style outputs. StarGAN [14] applies one generator to achieve multi-domain translation, it needs labeled training data.

GB bB
EAC c
Generator GB
Style code
Content code
Ladv A
Lstyle A
Downsampling Content Encoder EAC z c a Adaptive average pooling
Input output
MODELS CycleGAN DualGAN
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call