Abstract

Semantic image synthesis methods learn to generate new images conditioned on predefined semantic label maps. Existing methods require access to large-volume samples labeled with semantic maps, which limits their applications. We propose USIS, a Unified Semantic Image Synthesis model which can be trained on only a single or multiple pairs of images and semantic maps. Once trained, a USIS model can generate new images according to unseen semantic maps, as existing semantic image synthesis methods do. Specifically, we design a hierarchical architecture to reconstruct training samples and gradually learn the distributions of multi-scale patches in samples from coarse to fine. To avoid the error accumulation across scales, we propose a mixed training strategy to stabilize the training process. Extensive experiments on one- or multiple-sample datasets show our proposed model achieves state-of-the-art performance in terms of visual fidelity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call