Abstract

Existing image generation tasks produce blurry, unrealistic results and images that lack layers and structure. Depth information can be used to accurately control the relative positions and hierarchies between different objects in an image. Our goal is to enhance the realism, hierarchy, and quality of generated images by using depth information in image-to-image tasks. To address these issues, we propose a multi-conditional semantic image generation method that fuses depth information. The method is based on the network structure of Generative Adversarial Networks and fuses the depth information of multi-conditional inputs by using pairs of semantic labels and depth maps as inputs through our proposed Multi-scale Feature Extraction and Information Fusion Module. Furthermore, we add a channel-attention mechanism to the generator to strengthen the interconnectivity between channels and suppress confusion between different semantic features. With less increase in training cost, the module proposed in this paper can generate real images that match the input semantic layout. Through extensive testing on three challenging datasets, the images generated by this model produce superior visuals and data metrics, demonstrating the effectiveness of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call