Abstract

The traditional text-image confrontation model utilizes a convolutional network in the discriminator to extract image features, yet this fails to involve the spatial relationship between underlying objects, resulting in a poor-quality generated image. To remedy this, a capsule network is proposed to improve the model. The convolutional network in the discriminator is replaced with a capsule network, thereby improving the robustness of the images. Through experiments on the Oxford-102 and CUB datasets, it has been found that the new model can effectively improve the quality of generated text-image. The FID value of the generated flower image decreased by 14.49%, and the FID value of the generated bird image decreased by 9.64%. Additionally, the Inception Score of images generated on the Oxford-102 and CUB datasets increased by 22.60% and 26.28%, respectively, indicating that the improved model generated richer and more meaningful image features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call