Transformer-based image generation from scene graphs

Renato Sortino,Simone Palazzo,Francesco Rundo,Concetto Spampinato

doi:10.1016/j.cviu.2023.103721

Renato Sortino, Simone Palazzo + Show 2 more

Open Access

https://doi.org/10.1016/j.cviu.2023.103721

Copy DOI

Abstract

Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image. Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation, respectively. In this work, we show how employing multi-head attention to encode the graph information, as well as using a transformer-based model in the latent space for image generation can improve the quality of the sampled data, without the need to employ adversarial models with the subsequent advantage in terms of training stability.The proposed approach, specifically, is entirely based on transformer architectures both for encoding scene graphs into intermediate object layouts and for decoding these layouts into images, passing through a lower dimensional space learned by a vector-quantized variational autoencoder. Our approach shows an improved image quality with respect to state-of-the-art methods as well as a higher degree of diversity among multiple generations from the same scene graph. We evaluate our approach on three public datasets: Visual Genome, COCO, and CLEVR. We achieve an Inception Score of 13.7 and 12.8, and an FID of 52.3 and 60.3, on COCO and Visual Genome, respectively. We perform ablation studies on our contributions to assess the impact of each component. Code is available at https://github.com/perceivelab/trf-sg2im.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Vision and Image Understanding	Publication Date: May 18, 2023
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Transformer-based image generation from scene graphs

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Similar Papers

High-Quality Image Generation from Scene Graphs with Transformer
Xin Zhao ... Bin Gong
-
Xin Zhao, et. al.Xin Zhao ... Bin Gong
18 Jul 2022
18 Jul 2022

Improving text-to-image generation with object layout guidance
Jezia Zakraoui ... Moutaz Saleh
Multimedia Tools and Applications | VOL. 80
Jezia Zakraoui, et. al.Jezia Zakraoui ... Moutaz Saleh
20 May 2021
Multimedia Tools and Applications | VOL. 80

Image Generation from Scene Graph with Object Edges
Chenxing Li ... Xiaoming Tao
-
Chenxing Li, et. al.Chenxing Li ... Xiaoming Tao
01 Sep 2022
01 Sep 2022

Image generation models from scene graphs and layouts: A comparative analysis
Muhammad Umair Hassan ... Ibrahim A Hameed
Journal of King Saud University - Computer and Information Sciences | VOL. 35
Muhammad Umair Hassan, et. al.Muhammad Umair Hassan ... Ibrahim A Hameed
06 Apr 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer-based image generation from scene graphs

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding