Abstract

In this paper, we propose a multimodal and multiclass semi-supervised image-to-image translation (MM-SSIT) framework to address the dilemma between expensive labeled work and diversity requirement of image translation. A cross-domain adversarial autoencoder is proposed to learn disentangled latent domain-invariant content codes and domain-specific style codes. The style codes are matched with a prior distribution so that we can generate a series of meaningful samples from the prior space. The content codes are embedded into a multiclass joint data distribution by an adversarial learning between a domain classifier and a category classifier so that we can generate multiclass images at one time. Consequently, multimodal and multiclass cross-domain images are generated by joint decoding the latent content codes and sampled style codes. Finally, the networks for MM-SSIT framework are designed and tested. Semi-supervised experiments with comparisons to state-of-art approach show that the proposed framework has the ability to generate high-quality and diversiform images in case of fewer labeled samples. Further experiments in the unsupervised setting demonstrate that MM-SSIT is superior in learning disentangled representation and domain adaption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call