DSG-GAN: Multi-turn text-to-image synthesis via dual semantic-stream guidance with global and local linguistics

Heyu Sun,Qiang Guo

doi:10.1016/j.iswa.2023.200271

Heyu Sun, Qiang Guo

Open Access

https://doi.org/10.1016/j.iswa.2023.200271

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Multi-turn text-to-image synthesis task aims to manipulate desired visual content according to the user's intention step by step, which has recently attracted a lot of research interest in the community of language and vision. Different from traditional text-to-image synthesis, multi-turn text-to-image synthesis is more challenging as 1) it needs to continuously recognize the user's intention from spoken instruction and perceive the visual information from the source image; 2) it requires reasoning about the position, appearance, and characteristics of fresh modifications in target images as well as connecting objects in instructions with visual components in source images. To deal with this issue, in this paper, we propose a Dual Semantic-stream Guidance with global and local linguistics Generative Adversarial Network (DSG-GAN), which reasons and learns the user's intention from text description and iteratively manipulates visual information. Specifically, we design a novel dual semantic-stream discriminator, which combines with a hierarchical instruction encoder to evaluate the logic and quality between human intention in linguistic instruction and generates visual content from the perspective of global and fine-grained consistency matching. Meanwhile, the discriminator's backpropagation gradient is used to optimize the instruction encoder, which incentivizes it to purify the user's intention into global and local information that is consistent with the manipulation's visual representation. Extensive experiments show that even when producing high-resolution images and making deep iterative turns, our method performs significantly better due to local fine-grained linguistic information being combined with cross-modal correlation.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

DSG-GAN: Multi-turn text-to-image synthesis via dual semantic-stream guidance with global and local linguistics

Abstract

Published Version

Talk to us

Similar Papers

More From: Intelligent Systems with Applications

Lead the way for us

Journal: Intelligent Systems with Applications	Publication Date: Aug 30, 2023
License type: cc-by-nc-nd

Similar Papers

Fast color transfer from multiple images
Asad Khan ... Luo Jiang
Applied Mathematics-A Journal of Chinese Universities | VOL. 32
Asad Khan, et. al.Asad Khan ... Luo Jiang
01 Jun 2017
Applied Mathematics-A Journal of Chinese Universities | VOL. 32

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning
Zhenhuan Liu ... Liang Li
-
Zhenhuan Liu, et. al.Zhenhuan Liu ... Liang Li
12 Oct 2020
12 Oct 2020

Optimizing color transfer using color similarity measurement
Wei-Sung Chen ... Ming-Long Huang
-
Wei-Sung Chen, et. al.Wei-Sung Chen ... Ming-Long Huang
01 Jun 2016
01 Jun 2016

What is the McGurk effect?
Kaisa Tiippana
Frontiers in Psychology | VOL. 5
Kaisa TiippanaKaisa Tiippana
10 Jul 2014
Frontiers in Psychology | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

DSG-GAN: Multi-turn text-to-image synthesis via dual semantic-stream guidance with global and local linguistics

Abstract

Published Version

Talk to us

Similar Papers

More From: Intelligent Systems with Applications