Abstract

Text to image synthesizes is a motivating and valuable task to explore in these days. The objective is to produce images from texts or captions, i.e. considering textual descriptions as input and deploying them to produce related images. For carrying this task, two challenging elements of image generation and language modeling should be combined that are said to be more complex than caption generation. This research aims to develop a new text-to-image synthesis method that includes the two crucial steps of (i) text to picture encoding and (ii) optimized GAN. At first, cross-modal feature grouping is done throughout the text to image encoding. Consequently, text embeddings are transformed into textual feature vectors using BI-LSTM. The images are created based on the encoding in the following stage. Thus, optimized GAN receives input from text feature groups and outputs the end synthesis images. Here, a research plans to train the GAN with optimization strategy via tuning the optimal weights using the Dragon Customized Whale Optimization (DC-WO) model. Finally, the superiority of the created approach is evaluated by comparing it to existing techniques. Additionally, the FID value of the proposed model using the cub dataset is 29.04%, 47.43%, 49.09%, 37.64% and 11.35%, superior to traditional FF, WOA, GWO, DA and GAN + CMAF models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call