Abstract

AbstractIn this article, we propose a method to jointly control face image generation through semantic segmentation maps and text. Existing semantic segmentation maps lack detailed face attributes such as beards, and it is difficult to explicitly represent the gender of the target person by virtue of the semantic maps. State‐of‐the‐art face image generation methods guided by semantic segmentation maps mostly solved this by introducing the original image for supervision, which cannot accurately control the detailed attributes of the target face. At the same time, the text‐guided image generation method perform poorly in controlling the front and side of the face pose. Therefore, we propose an idea that the semantic segmentation map controls the coarse content of the target image and the text controls the fine details of the target image. Through the well‐designed mapping network and content mixing mechanism, the model in this article flexibly draws on the advantages of the two modes, and can generate high‐resolution images that are diverse, high‐quality, and more faithful to the target in detail attributes than most of previous methods. Extensive experiments demonstrate the superior performance of the proposed method in terms of accuracy and fidelity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.