Abstract

In recent years, deep learning technology has made breakthroughs in computer vision. After using large-scale data training, the deep neural network represented by GAN is significantly better than previous technologies in image generation, including generating more reasonable, higher-definition, more complex, and more accurate images. With the continuous development of datasets, models, and applications, the fusion of different modal information, including fusion of natural language, semantic layouts, tags, edge maps, and other different modal information, to generate images has become a new demand and challenge. There are related reviews on image generation and multimodal deep learning. However, there has not been a review dedicated to multimodal deep-learning image generation to discuss the current status, existing problems, and challenges of this task. Therefore, this review proposes a survey on multimodal deep learning image generation. It aims to provide readers with an application scenario for multimodal deep learning image generation. Also, it provides readers with new multimodal deep learning image generation technologies, the relevant datasets, evaluation metrics used, and some results comparison. Finally, this article describes some of the challenges and future topics of multimodal deep learning image generation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.