Automatic Synthesis of Realistic Human Faces from Text using GANs

Kushal Jivarajani

doi:10.22214/ijraset.2023.53433

Abstract

Abstract: The field of image generation has witnessed significant advancements in recent years, particularly through the application of Generative Adversarial Networks (GANs). Many capable generative adversarial networks (GANs) models have emerged in recent times which help in synthesizing, generating real-like images and playing around with images using text. However, most existing tasks are limited to generating simple images such as flowers from captions.This project aims to create a deep learning based model and a system that tries to generate realistic human facial images from a given textual description. The project explores the utilization and applications of GANs, and uses a particular type like VQGAN. In this work, we extend this problem to the domain of face generation from fine-grained textual descriptions of the human face, e.g. “A person has curly hair, an oval face, and a mustache”. To expedite this process, the project uses VQGAN that is conditioned on the given text using a language conditioning model called CLIP (Contrastive Language-Image Pretraining). The model is trained on the CelebA image dataset. Overall, the project seeks to improve the speed and accuracy of mapping faces to text by leveraging advances in deep learning and image generation. We broaden this objective in our study to the less-explored area of face generation using precise textual descriptions of faces.

Full Text