Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

Ahmed Abotaleb,Mohamed W Fakhr,Mohamed Zaki

doi:10.1038/s41598-024-76407-9

Abstract

Multimodal Conditioned face image generation and face super-resolution are significant areas of research. To achieve optimal results, this paper utilizes diffusion models as the primary engine for these tasks. This paper presents two main contributions: (1) “Speaking the Language of Faces” (SLF): a flexible, modular, fusion-less and architecturally simple multimodal system. (2) A Scalability scheme and a sensitivity analysis which can assist practitioners in system parameter estimation and feature selection. SLF consists of two main components: a feature vector generator (encoder), and an image generator (decoder) utilizing a conditional diffusion model. SLF can accept various inputs, including low-resolution images, speech signals, person attributes (age, gender, ethnicity), or any combination of these. Moreover, Scalability based on conditional scale values is utilized. The implementation of SLF has confirmed its versatility (e.g., speech to face image generation, conditioned face super-resolution). We trained multiple system versions to conduct a sensitivity analysis and to determine the influence of each individual feature on the output image. Consequently, speaker embeddings have proven to be sufficient audio features for our task. It was also found that the effects of audio signals are profound and are more pronounced than those of the low resolution images (8 × 8), whose effects are still significant. The effect of gender, ethnicity and age were found to be moderate. On another note, conditional scale values significantly impact the system’s behavior and performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Journal: Scientific Reports	Publication Date: Nov 8, 2024
License type: cc-by-nc-nd

Similar Papers

Reference Based Face Super-Resolution
Zhi-Song Liu ... Yui-Lam Chan
IEEE Access | VOL. 7
Zhi-Song Liu, et. al.Zhi-Song Liu ... Yui-Lam Chan
01 Jan 2019
IEEE Access | VOL. 7

Locality preserving projections as a new manifold analysis approach for robust face super-resolution
Sung Won Park ... Marios Savvides
-
Sung Won Park, et. al.Sung Won Park ... Marios Savvides
09 Apr 2007
09 Apr 2007

Latent Vector Prototypes Guided Conditional Face Synthesis
Qiyu Wei ... Xulei Yang
-
Qiyu Wei, et. al.Qiyu Wei ... Xulei Yang
16 Oct 2022
16 Oct 2022

FSRGAN-DB: Super-resolution Reconstruction Based on Facial Prior Knowledge
Wengang Zhou ... Zhiqiang Zeng
-
Wengang Zhou, et. al.Wengang Zhou ... Zhiqiang Zeng
10 Dec 2020
10 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model

Abstract

Talk to us

Similar Papers

More From: Scientific Reports