Abstract

Image-to-image translation that refers to the synthesis task of transferring images from the source domain to the target domain has gained significant progress in recent times. Multimodal image-to-image translation aims to generate images with multiple styles of target domains. However, the existing multimodal image-to-image translation network architectures are incapable of accurately transferring the style of a specified image. Moreover, they require an additional deep encoder network to extract the image style code, which increases the network parameters. To address this problem, we propose Sunit, a multimodal unsupervised image-to-image translation with a shared encoder. Sunit shares an encoder network between the discriminator and style encoder. This method reduces the number of network parameters and uses the information from the discriminator to extract the style. Furthermore, we design a training strategy in which the style encoder solely uses the style reconstruction loss and does not follow the generator to train. In this manner, the target of the style encoder becomes clearer. Finally, extensive experimental validations are carried out on the AFHQ and Celeb-HQ datasets. The results demonstrate that our approach outperforms the state-of-the-art methods in the task of reference-guided image translation and transfers the style of the specified image more accurately.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.