Abstract

Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$G$</tex-math></inline-formula> tries to generate realistic-looking fashion images based on a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\langle \mathsf {pose}, \mathsf {text} \rangle$</tex-math></inline-formula> pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$D$</tex-math></inline-formula> distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call