Abstract

In the research of computer vision, artificial controllability of image synthesis is a significant and challenging task. At present, there are two available methods. One is to utilize a simple contour to determine the shape of the synthetic object. This method has a promising effect, but it can only control the shape information of the synthetic object, but not the specific content. The other is to employ the text description to synthesize the corresponding image, which effectively controls the specific content of the synthesis, but it cannot do anything for the synthesized shape. In this paper, we propose a highly flexible and human customizable image synthesis model based on simple contour and natural language description, in which the specific content of contour and text description can be determined artificially. The contour determines basic synthetic object shape, and the natural language describes specific object content. Based on these, highly authentic and customizable images can be synthesized. The experiments are executed in the Caltech-UCSD Birds (CUB) and Oxford-102 flower datasets, and the experimental results demonstrate the effectiveness and superiority of our method. The results not only maintain the contour but also conform to the natural language description. Simultaneously, the high-quality image synthesis results, based on artificial hand-drawing contour and text description, are displayed to illustrate the high flexibility and customizability of our model.

Highlights

  • Image synthesis is always the core of research in computer vision

  • DATASET AND DATA PREPROCESSING We validated our method on the Caltech-UCSD Birds [11] dataset and the Oxford-102 flower [12] dataset. 10 text descriptions are collected [60] for each image

  • In this work, we propose a customizable image synthesis based on contour and text descriptions

Read more

Summary

INTRODUCTION

Image synthesis is always the core of research in computer vision. In recent years, with the development of deep learning technology, image synthesis has made many breakthroughs. For the research of image synthesis based on the text description, many works have been done, and encouraging results have been achieved None of these works can control the shape, size, and position of the synthesized object. To alleviate this problem and achieve better control of the synthesis details, Reed et al proposed the Generative Adversarial What-Where Network (GAWWN) [10], using the bounding box and the key points to determine the location and shape of the target, and generated specific content based on the text description.

RELATED WORK
TRAINING DETAILS AND FURTHER EXPLORATION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.