AraBERT and DF-GAN fusion for Arabic text-to-image generation

Mourad Bahani,Khalil Maalmi,Aziza El Ouaazizi

doi:10.1016/j.array.2022.100260

Abstract

Current AI systems have shown impressive results in the Automatic synthesis of realistic images from text descriptions tasks. In fact, Generative Adversarial Networks (GANs) are widely used in text-to-image generation tasks. The generator generates realistic images given the noise and sentence vectors, and the discriminator produces a probability of how the synthetic images are reals. In this paper, in order to generate images from Arabic text, we fuse DF-GAN as a sample and efficient text-to-image generation framework and AraBERT architecture. To achieve this purpose, firstly, we re-create new datasets matching the Arabic text-to-image generation task by applying DeepL-Translator from English to Arabic on text descriptions of original datasets. Secondly, we leverage the power of AraBERT which is trained on billions of Arabic words to produce a strong sentence embedding, and we reduce that vector’s dimension to match with DF-GAN shape. Thirdly, we inject the reduced sentence embedding into the UPBlocks sections of DF-GAN and we train the proposed architecture on two challenging datasets. Following the previous works, we use CUB and Oxford-102 flowers as original datasets. Further, we measure our framework with FID and IS. Our framework is the first that achieve much success in generating high-resolution realistic and text-matching images conditioned with Arabic text.

Full Text