Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation

Yiyang Ma,Jiaying Liu,Huan Yang,Haowei Kuang,Jianlong Fu

doi:10.1145/3694974

Abstract

Text-driven face image generation and manipulation are significant tasks. However, such tasks are quite challenging due to the gap between text and image modalities. It is difficult to utilize current methods to deal with both of the two problems because these methods are usually designed for one certain task, limiting their application in real scenarios. To address the two problems in one framework, we propose a U nified P rompt-based C ross- M odal Frame work (UPCM-Frame) to bridge the gap between the text modality and image modality with CLIP and StyleGAN, which are two large-scale pre-trained models. The proposed framework is combined with two main modules: a Text Embedding-to-Image Embedding projection module based on a special prompt embedding pair, and a projection module mapping Image Embeddings to semantically aligned StyleGAN Embeddings which can be used in both image generation and manipulation. The proposed framework is able to handle complicated descriptions and generate impressive results with high quality due to the utilization of large-scale pre-trained models. In order to demonstrate the effectiveness of the proposed method in the two tasks, we conduct experiments to evaluate the results of our method both quantitatively and qualitatively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Similar Papers

Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu ... Wen Zhang
-
Yushan Zhu, et. al.Yushan Zhu ... Wen Zhang
17 Oct 2021
17 Oct 2021

Image-Text Cross-Media Feature Correlation based on Adversarial Network
Ying Xia ... Gengquan Tian
-
Ying Xia, et. al.Ying Xia ... Gengquan Tian
01 Dec 2019
01 Dec 2019

Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities.
Yuto Watanabe ... Miki Haseyama
Sensors (Basel, Switzerland) | VOL. 23
Yuto Watanabe, et. al.Yuto Watanabe ... Miki Haseyama
20 Nov 2023
Sensors (Basel, Switzerland) | VOL. 23

A multi-task framework based on decomposition for multimodal named entity recognition
Chenran Cai ... Ruifeng Xu
Neurocomputing | VOL. 604
Chenran Cai, et. al.Chenran Cai ... Ruifeng Xu
13 Aug 2024
Neurocomputing | VOL. 604

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications