Abstract

Unpaired image-to-image translation among category domains has achieved remarkable success in past decades. Recent studies mainly focus on two challenges. For one thing, such translation is inherently multi-modal (i.e. many-to-many mapping) due to variations of domain-specific information (e.g., the domain of house cat contains multiple sub-modes), which is usually addressed by predefined distribution sampling. For another, most existing multi-modal approaches have limits in handling more than two domains with one model, i.e. they have to independently build two distributions to capture variations for every pair of domains. To address these problems, we propose a Hierarchical Image-to-image Translation (HIT) method which jointly formulates the multi-domain and multi-modal problem in a semantic hierarchy structure by modeling a common and nested distribution space. Specifically, domains have inclusion relationships under a particular hierarchy structure. With the assumption of Gaussian prior for domains, distributions of domains at lower levels capture the local variations of their ancestors at higher levels, leading to the so-called nested distributions. To this end, we propose a nested distribution loss in light of the distribution divergence measurement and information entropy theory to characterize the aforementioned inclusion relations among domain distributions. Experiments on ImageNet, ShapeNet, and CelebA datasets validate the promising results of our HIT against state-of-the-arts, and as additional benefits of nested modeling, one can even control the uncertainty of multi-modal translations at different hierarchy levels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call