Towards Better Control Of Latent Spaces For Face Editing
Generative models can synthesize diverse and photo-realistic images that have demonstrated remarkable success in computer vision. Notably, Generative Adversarial Networks (GANs) trained for faces (i.e., StyleGAN2) can be considered as a powerful image generation pipeline. However, the entangled space of features learned by GANs restricts precise control of modifying the content of generated images. This paper introduces a framework designed to enhance the control for face editing in image generation by disentanglement of feature spaces of GANs (a.k.a. GAN spaces). Our framework aims to enable better control for modification of face concepts such as pose, expression, and illumination. For this purpose, the framework first learns multiple latent spaces for parameterization of face concepts learned by 3D Morphable face models. Then, it employs a Identity-Conditioned Attention Mechanism (ICAM) to decouple face identity representations from the parameterized face concepts. Moreover, we adapt variational autoencoders to model the hierarchical structure of GAN features by incorporating transformer networks for end-to-end optimization of model parameters. Our results show that our ICAM achieves state-of-the-art identity preservation and editing precision accuracy on benchmark datasets with improved training time and memory usage.