Abstract

Recently, scene graph based image generation has emerged to be an important research direction for advanced multi-instance generation tasks. Scene layout generation, the phase that produces the instance-wise visual representations and maintains the spatial relationships among all instances, is vital in translating a scene graph to an image. However, the scene layouts generated by existing methods are too coarse and semantically inconsistent with the given scene graph, which results in quality degradation of the generated images. Motivated by this, we propose a novel scene graph based image generation model called DeformSg2im, which aims to generate visually appealing and semantically faithful images according to the given scene graphs. Our method addresses the aforementioned problems from two aspects. In one aspect, we present an attention-based instance embedding estimator to refine the shape information for each instance. By introducing attention maps to the estimation of instance embeddings, our method is able to generate more plausible instances with sharp edges on the images. In the other aspect, a spatial warping network (SWN) is proposed to adaptively capture the spatial dependencies among instances. With sequential modeling and geometric deformations, the SWN is capable of generating the scene layout conforming to the given scene graph. Extensive experiments show that our model generates images with high visual quality and achieves competitive quantitative results compared to existing works. Ablation studies on the proposed modules are also conducted, and the results demonstrate the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call