Emotion Manipulation for Talking-Head Videos via Facial Landmarks

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.

Similar Papers
  • Research Article
  • Cite Count Icon 41
  • 10.1109/tcsvt.2021.3083257
Image-to-Video Generation via 3D Facial Dynamics
  • May 27, 2021
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Xiaoguang Tu + 10 more

We present a versatile model, FaceAnime, for various video generation tasks from still images. Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks. However, the generated face images usually suffer from quality loss, image distortion, identity change, and expression mismatching due to the weak representation capacity of the facial landmarks. In this paper, we propose to “imagine” a face video from a single face image according to the reconstructed 3D face dynamics, aiming to generate a realistic and identity-preserving face video, with precisely predicted pose and facial expression. The 3D dynamics reveal changes of the facial expression and motion, and can serve as a strong prior knowledge for guiding highly realistic face video generation. In particular, we explore face video prediction and exploit a well-designed 3D dynamic prediction network to predict a 3D dynamic sequence for a single face image. The 3D dynamics are then further rendered by the sparse texture mapping algorithm to recover structural details and sparse textures for generating face frames. Our model is versatile for various AR/VR and entertainment applications, such as face video retargeting and face video prediction. Superior experimental results have well demonstrated its effectiveness in generating high-fidelity, identity-preserving, and visually pleasant face video clips from a single source face image.

  • Conference Article
  • 10.2991/icmemtc-16.2016.3
Face Joint Alignment Using Local Method
  • Jan 1, 2016
  • Gang Zhang + 2 more

It is an under-determined problem that local methods are used for face alignment of an image, although good results can be obtained by using an auxiliary model or a priori information.In comparison, joint alignment using multiple face images of the same person has more advantages.In this paper, the rectangle round a face is acquired, and then logistic regressors are used to obtain the candidates of the regions round the landmark points.The non-parametric face shape models are used to constrain the configuration among the regions.On this basis, Generalized Procrustes Analysis is used for rigid joint alignment.Tests are carried out on LFW dataset, and the results show that joint alignment will cause overall drifting, and but helpful for facial landmark alignment in outer contour.

  • Research Article
  • Cite Count Icon 757
  • 10.1007/s11263-016-0940-3
Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks
  • Aug 10, 2016
  • International Journal of Computer Vision
  • Rasmus Rothe + 2 more

In this paper we propose a deep learning solution to age estimation from a single face image without the use of facial landmarks and introduce the IMDB-WIKI dataset, the largest public dataset of face images with age and gender labels. If the real age estimation research spans over decades, the study of apparent age estimation or the age as perceived by other humans from a face image is a recent endeavor. We tackle both tasks with our convolutional neural networks (CNNs) of VGG-16 architecture which are pre-trained on ImageNet for image classification. We pose the age estimation problem as a deep classification problem followed by a softmax expected value refinement. The key factors of our solution are: deep learned models from large data, robust face alignment, and expected value formulation for age regression. We validate our methods on standard benchmarks and achieve state-of-the-art results for both real and apparent age estimation.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/iros47612.2022.9982284
From Local to Holistic: Self-supervised Single Image 3D Face Reconstruction Via Multi-level Constraints
  • Oct 23, 2022
  • Yawen Lu + 3 more

Single image 3D face reconstruction with accurate geometric details is a critical and challenging task due to the similar appearance on the face surface and fine details in organs. In this work, we introduce a self-supervised 3D face reconstruction approach from a single image that can recover detailed textures under different camera settings. The proposed network learns high-quality disparity maps from stereo face images during the training stage, while just a single face image is required to generate the 3D model in real applications. To recover fine details of each organ and facial surface, the framework introduces facial landmark spatial consistency to constrain the face recovering learning process in local point level and segmentation scheme on facial organs to constrain the correspondences at the organ level. The face shape and textures will further be refined by establishing holistic constraints based on the varying light illumination and shading information. The proposed learning framework can recover more accurate 3D facial details both quantitatively and qualitatively compared with state-of-the-art 3DMM and geometry-based reconstruction algorithms based on a single image.

  • Conference Article
  • Cite Count Icon 33
  • 10.1109/btas.2012.6374581
3D face texture modeling from uncalibrated frontal and profile images
  • Sep 1, 2012
  • Hu Han + 1 more

3D face modeling from 2D face images is of significant importance for face analysis, animation and recognition. Previous research on this topic mainly focused on 3D face modeling from a single 2D face image; however, a single face image can only provide a limited description of a 3D face. In many applications, for example, law enforcement, multi-view face images are usually captured for a subject during enrollment, which makes it desirable to build a 3D face texture model, given a pair of frontal and profile face images. We first determine the correspondence between un-calibrated frontal and profile face images through facial landmark alignment. An initial 3D face shape is then reconstructed from the frontal face image, followed by shape refinement utilizing the depth information provided by the profile image. Finally, face texture is extracted by mapping the frontal face image on the recovered 3D face shape. The proposed method is utilized for 2D face recognition in two scenarios: (i) normalization of probe image, and (ii) enhancing the representation capability of gallery set. Experimental results comparing the proposed method with a state-of-the-art commercial face matcher and densely sampled LBP on a subset of the FERET database show the effectiveness of the proposed 3D face texture model.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/cvprw.2008.4563127
3D face econstruction from a single 2D face image
  • Jun 1, 2008
  • Sung Won Park + 2 more

T3D face reconstruction from a single 2D image is mathematically ill-posed. However, to solve ill-posed problems in the area of computer vision, a variety of methods has been proposed; some of the solutions are to estimate latent information or to apply model based approaches. In this paper, we propose a novel method to reconstruct a 3D face from a single 2D face image based on pose estimation and a deformable model of 3D face shape. For 3D face reconstruction from a single 2D face image, it is the first task to estimate the depth lost by 2D projection of 3D faces. Applying the EM algorithm to facial landmarks in a 2D image, we propose a pose estimation algorithm to infer the pose parameters of rotation, scaling, and translation. After estimating the pose, much denser points are interpolated between the landmark points by a 3D deformable model and barycentric coordinates. As opposed to previous literature, our method can locate facial feature points automatically in a 2D facial image. Moreover, we also show that the proposed method for pose estimation can be successfully applied to 3D face reconstruction. Experiments demonstrate that our approach can produce reliable results for reconstructing photorealistic 3D faces.

  • Research Article
  • Cite Count Icon 16
  • 10.1016/j.neucom.2019.04.050
3D facial expression modeling based on facial landmarks in single image
  • May 10, 2019
  • Neurocomputing
  • Chenlei Lv + 3 more

3D facial expression modeling based on facial landmarks in single image

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icalip.2012.6376701
3D model reconstruction and animation from single view face image
  • Jul 1, 2012
  • Narendra M Patel + 1 more

Due to Multimedia application and demand of realistic animation in movie/game development requires advanced techniques for character generation and dynamic movement of these characters to create realistic scene. These applications need the construction of 3D model of character specifically face and it is followed by animation. In many applications a person specific 3D face model is required. Generally, many views or images of face are required to construct a 3D model of it. Many times it is not possible to obtain many views of a face. In such cases, there is a need to develop a 3D face model from a single view face image. In this paper we have proposed an algorithm which automatically constructs a 3D model using generic face model and single view face image. Our proposed algorithm first extracts features like eyes, mouth, eyebrows, nose etc and also determines the pose of given single view face image to adapt a generic 3D face model in accordance with given face image to construct person specific 3D face model. The 3D specific face can finally be synthesized by texturing the individualized face model. Our algorithm successfully construct 3D model from the face image which may have different orientation, different illumination, head with hair and eyes with spectacle.

  • Research Article
  • 10.1177/18761364251315239
MMSAD—A multi-modal student attentiveness detection in smart education using facial features and landmarks
  • Feb 21, 2025
  • Journal of Ambient Intelligence and Smart Environments
  • Ruchi Singh + 2 more

Virtual education (online education or e-learning) is a form of education where the primary mode of instruction is through digital platforms and the Internet. This approach offers flexibility and accessibility, making it attractive to many students. Many institutes also offer virtual professional courses for business and working professionals. However, ensuring the reachability of courses and evaluating students’ attentiveness presents significant challenges for educators teaching virtually. Various research works have been proposed to evaluate students’ attentiveness using facial landmarks, facial expressions, eye movements, gestures, postures, etc. However, no method has been proposed for real-time analysis and evaluation. This paper introduces a multi-modal student attentiveness detection (MMSAD) model designed to analyze and evaluate real-time class videos using two modalities: facial expressions and landmarks. Using a lightweight deep learning model, the model analyzes students’ emotions from facial expressions and identifies when a person is speaking during an online class by examining lip movements from facial landmarks. The model evaluates students’ emotions using five benchmark datasets, achieving accuracy rates of 99.05% on extended Cohn-Kanade (CK+), 87.5% on RAF-DB, 78.12% on Facial Emotion Recognition-2013 (FER-2013), 98.50% on JAFFE, and 88.01% on KDEF. The model identifies individuals speaking during the class using real-time class videos. The results from these modalities are used to predict attentiveness, categorizing students as either attentive or inattentive.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/iceit.2010.5608429
3D face reconstruction using a single 2D face image
  • Sep 1, 2010
  • Yue Ming + 2 more

In this paper, a novel framework for 3D face reconstruction from a single 2D face image was proposed. We focus on generating 3D face model without expensive devices and complicated calculation. First, we preprocess 2D face image, including illumination compensation, face detection and feature point extraction. Then, the method is based on a 3D morphable face model that encodes shape and texture in terms of model parameters. The prior 3D face model is a linear combination of “eigenheads” obtained by applying PCA to a training set of laser-scanned 3D faces. To account for pose and illumination variations, the algorithm simulates the process of image formation in 3D face and it estimates 3D face and texture of faces from a single image. As opposed to previous literature, our method can locate facial feature points automatically in a 2D facial image. Moreover, we also show that our proposed method for pose estimation can be successfully applied to 3D face reconstruction. In our experiment results, the method proposed has a satisfied performance regarding calculating time and the reconstructing photorealistic 3D faces.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icip.2004.1419764
Face identification from one single sample face image
  • Jan 1, 2016
  • Hung-Son Le + 1 more

This paper is addressing a challenging face recognition problem: face identification from one single face image. We present a novel approach to face identification, which is capable to identify a person from face images that are significantly different from the sample image in terms of illumination, camera view angles and expressions. The approach is based on a new measurement of dissimilarity between the two face images. A person is identified based on the smallest dissimilarity, which is the summation of the dissimilarities of all pairs of observations extracted from the face image on both vertical and horizontal directions. Our experiment results tested on both the AR face database and the CMU PIE face database shows that the proposed method outperforms the PCA, LDA, LFA based approaches.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.patcog.2018.03.002
Monocular 3D facial shape reconstruction from a single 2D image with coupled-dictionary learning and sparse coding
  • Mar 6, 2018
  • Pattern Recognition
  • Pengfei Dou + 3 more

Monocular 3D facial shape reconstruction from a single 2D image with coupled-dictionary learning and sparse coding

  • Conference Article
  • 10.1109/iceic57457.2023.10049847
Audio-to-Facial Landmarks Generator for Talking Face Video Synthesis
  • Feb 5, 2023
  • Dasol Jeong + 2 more

Audio driven talking face methods have been studied to process the accuracy lip synchronization. However, how to create movement of head poses and personalized facial features is a challenging problem. In order to solve this problem, it is necessary to identify the context based on the audio, create the head pose and lip motion, and synthesize the personalized face. We introduce a facial landmark generation method including audio-based head pose and lip motion using an audio transformer. The audio transformer extracts audio features containing contextual information and creates generalized head pose and lip motion landmarks. In order to synthesize personalized features on the generated landmarks, a talking face video is generated by applying the method learned through meta-learning. With just a few single images, even unknown faces can be spoken in the audio you want. In addition, the proposed method is applicable to various languages, and enables photo-realistic synthesis and fast inference.

  • Research Article
  • Cite Count Icon 133
  • 10.1016/j.eswa.2015.10.047
Fully automatic face normalization and single sample face recognition in unconstrained environments
  • Nov 19, 2015
  • Expert Systems with Applications
  • Mohammad Haghighat + 2 more

Fully automatic face normalization and single sample face recognition in unconstrained environments

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-3-642-03270-7_20
Photo Realistic 3D Cartoon Face Modeling Based on Active Shape Model
  • Jan 1, 2009
  • Zhigeng Pan + 6 more

We present a novel framework to automatically build 3D carton face model from a single frontal face image. We use the improved ASM algorithm to automatically detect the deformed key feature control points in the face image. The deformation of the control points is compared to that of a standard (average) face, and exaggerated based on face shape and the type of organs and their pitch. RBF-based smooth interpolation is used to generate a 3D model from the exaggerated control points. The resulting 3D human face model not only preserves the identity of the subject in the photo, but also looks cartoonish with exaggerated facial features. Experiments with a large number of real photographs show that our framework is feasible.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.