Image generation models from scene graphs and layouts: A comparative analysis

Muhammad Umair Hassan,Saleh Alaliyat,Ibrahim A Hameed

doi:10.1016/j.jksuci.2023.03.021

Abstract

An image is the abstraction of a thousand words. The meaning and essence of complex topics, ideas, and concepts can be easily and effectively conveyed visually by a single image rather than a lengthy verbal description. It is not only essential to teach computers how to recognize and classify images but also how to generate them. Controlled image generation depicting complex and multiple objects is a challenging task in computer vision despite the significant advancements in generative modeling. Among the core challenges, scene graph-based and scene layout-based image generation is a significant problem in computer vision and requires generative models to reason about object relationships and compositionality. Due to its ease of use, less time cost, and labor needs, image generation/synthesizing models from scene graphs and layouts are proliferating. In the case of a more significant number of scene graphs and layout to image generation models, a unique experimental evaluation methodology is required to evaluate the controlled image generation. To this extent, we, in this work, present a standard methodology to evaluate the performance of scene graph and scene layout-based image generation models. We perform a comparative analysis of image generation models to evaluate image generation models’ complexity from scene graphs and scene layouts. We analyze the different components of these models on Visual Genome and COCO-Stuff datasets. The experimental results show that the scene layout-based image generation outperforms its graph-based counterpart in most quantitative and qualitative evaluations.

Full Text