Abstract

Synthesizing vivid images with descriptive texts is gradually emerging as a frontier cross-domain generation task. However, it is obviously inadequate to generate the high-quality image with one single sentence accurately due to the information asymmetry between modalities, which needs external knowledge to balance the process. Moreover, the limited description of the entities in the sentence cannot guarantee the semantic consistency between text and generated image, causing the deficiency of details in foreground and background. Here, we propose a commonsense-driven generative adversarial network to generate photo-realistic images depending on entity-related commonsense knowledge. Commonsense-driven generative adversarial network contains 2 key commonsense-based modules: (a) Entity semantic augment is designed to enhance entity semantics with common sense for abating the information asymmetry, and (b) adaptive entity refinement is used to generate the high-resolution image guided by various commonsense knowledges in multistage for keeping text-image consistency. We demonstrated extensive synthetic cases on the widely used CUB-birds (Caltech-UCSD Birds-200-2011) dataset, where our model achieves competitive results compared to the other state-of-the-art models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call