Language-vision matching for text-to-image synthesis with context-aware GAN

Yingli Hou,Wei Zhang,Zhiliang Zhu,Hai Yu

doi:10.1016/j.eswa.2024.124615

Abstract

Text-to-image generation (T2I) aims to produce visually compelling images while maintaining a high degree of semantic consistency with textual descriptions. Despite the impressive progress made by existing methods, there are problems with limited details of synthesized images and insufficient correlation between the provided text description and the generated images. To address these issues, we propose a Context-Aware Generative Adversarial Network (CA-GAN), which generates images aligned with the input text representations. Specifically, the Context-Aware Block (CA-Block) learns a semantic-adaptive transformation based on text style, enabling the effective fusion of text descriptions and image features for high-quality image generation with better language-vision matching. Furthermore, we propose an Attention Convolution Module (ACM) to identify greater representative traits and avoid the inability to capitalize on non-local contextual information, which enables our model to generate images that exhibit numerous detailed attributes while maintaining high-quality and semantic consistency. Thereafter, we integrate self-attention with convolution to enhance feature maps to reinforce the semantic information in the discriminator, emphasizing critical feature channels while suppressing extraneous information, ultimately yielding more detailed and richer images. The experimental results demonstrate the superiority of our method over the SOTA approaches. In addition, further studies confirm the effectiveness of the generated visual details, which exhibit a high degree of alignment with the input text descriptions. Notably, our attention mechanism showcases cooperative effects contributing to overall performance improvement. The code is available at: https://github.com/hylneu/CAGAN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Language-vision matching for text-to-image synthesis with context-aware GAN

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Similar Papers

LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation
Zijun Deng ... Yuxin Peng
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19
Zijun Deng, et. al.Zijun Deng ... Yuxin Peng
12 Jul 2023
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19

MirrorGAN: Learning Text-To-Image Generation by Redescription
Tingting Qiao ... Jing Zhang
-
Tingting Qiao, et. al.Tingting Qiao ... Jing Zhang
01 Jun 2019
01 Jun 2019

Eye movements and mental imagery during reading of literary texts with different narrative styles.
Lilla Magyari ... Arthur M Jacobs
Journal of eye movement research | VOL. 13
Lilla Magyari, et. al.Lilla Magyari ... Arthur M Jacobs
30 Mar 2020
Journal of eye movement research | VOL. 13

CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis
Mingkuan Yuan ... Yuxin Peng
IEEE Transactions on Multimedia | VOL. 22
Mingkuan Yuan, et. al.Mingkuan Yuan ... Yuxin Peng
22 Nov 2019
IEEE Transactions on Multimedia | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language-vision matching for text-to-image synthesis with context-aware GAN

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications