Abstract

AbstractAlthough some progress has been made in the layout‐to‐image generation of complex scenes with multiple objects, object‐level generation still suffers from distortion and poor recognizability. We argue that this is caused by the lack of feature encodings for edge information during image generation. In order to solve these limitations, we propose a novel edge‐enhanced Generative Adversarial Network for layout‐to‐image generation (termed EL‐GAN). The feature encodings of edge information are learned from the multi‐level features output by the generator and iteratively optimized along the generator's pipeline. Two new components are included at each generator level to enable multi‐scale learning. Specifically, one is the edge generation module (EGM), which is responsible for converting the output of the multi‐level features by the generator into images of different scales and extracting their edge maps. The other is the edge fusion module (EFM), which integrates the feature encodings refined from the edge maps into the subsequent image generation process by modulating the parameters in the normalization layers. Meanwhile, the discriminator is fed with frequency‐sensitive image features, which greatly enhances the generation quality of the image's high‐frequency edge contours and low‐frequency regions. Extensive experiments show that EL‐GAN outperforms the state‐of‐the‐art methods on the COCO‐Stuff and Visual Genome datasets. Our source code is available at https://github.com/Azure616/EL-GAN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call