Learning 3D Head Pose From Synthetic Data: A Semi-Supervised Approach

Shubhajit Basak,Peter Corcoran,Faisal Khan,Rachel Mcdonnell,Michael Schukat

doi:10.1109/access.2021.3063884

Shubhajit Basak, Peter Corcoran + Show 3 more

Open Access

https://doi.org/10.1109/access.2021.3063884

Copy DOI

Abstract

Accurate head pose estimation from 2D image data is an essential component of applications such as driver monitoring systems, virtual reality technology, and human-computer interaction. It enables a better determination of user engagement and attentiveness. The most accurate head pose estimators are based on Deep Neural Networks that are trained with the supervised approach and rely primarily on the accuracy of training data. The acquisition of real head pose data with a wide variation of yaw, pitch and roll is a challenging task. Publicly available head pose datasets have limitations with respect to size, resolution, annotation accuracy and diversity. In this work, a methodology is proposed to generate pixel-perfect synthetic 2D headshot images rendered from high-quality 3D synthetic facial models with accurate head pose annotations. A diverse range of variations in age, race, and gender are also provided. The resulting dataset includes more than 300k pairs of RGB images with corresponding head pose annotations. A wide range of variations in pose, illumination and background are included. The dataset is evaluated by training a state-of-the-art head pose estimation model and testing against the popular evaluation-dataset Biwi. The results show that training with purely synthetic data generated using the proposed methodology achieves close to state-of-the-art results on head pose estimation which are originally trained on real human facial datasets. As there is a domain gap between the synthetic images and real-world images in the feature space, initial experimental results fall short of the current state-of-the-art. To reduce the domain gap, a semi-supervised visual domain adaptation approach is proposed, which simultaneously trains with the labelled synthetic data and the unlabeled real data. When domain adaptation is applied, a significant improvement in model performance is achieved. Additionally, by applying a data fusion-based transfer learning approach, better results are achieved than previously published work on this topic.

Highlights

Head Pose Estimation (HPE) continues to be an active area of research in the computer vision (CV) domain because of its diverse application across a range of CV technologies
The only work that deals with domain adaptation on the regression task, on HPE, is proposed by Kuhnke and Ostermann [42], which reduces the negative transfer from the source outliers through generating source sampler weights during training and propose Partial Adversarial Domain Adaptation for Continuous label spaces (PADACO)
EVALUATION OF THE DATA first, the details of the state-of-the-art model that is used in this work to evaluate the effectiveness of the generated synthetic data are discussed including the domain adaptation module that is added to the existing model architecture

Summary

INTRODUCTION

Head Pose Estimation (HPE) continues to be an active area of research in the computer vision (CV) domain because of its diverse application across a range of CV technologies. Published works use different modalities like depth information [2]–[5], inertial measurement unit (IMU) [6] or video sequences [7] as a cue to map the features extracted from the 2D image to the 3D space These methods require more computation and different sensors which are not always available. Generating synthetic facial images through Computer Graphics (CG) Software provides an inexpensive and sufficient amount of accurately labelled data with a comparatively low effort and complexity as the head models, camera parameters and positions, scene illuminations and other constraints can be controlled within the 3D environment.

HEAD POSE ESTIMATION METHODS

A COMPARISON OF DIFFERENT HEAD POSE DATASETS

AVAILABLE HEAD POSE DATASETS

DATA GENERATION METHODOLOGY

DATASET DETAILS

SYNTHETIC TO REAL DOMAIN ADAPTATION

VIII. DISCUSSION

CONCLUSION AND FUTURE WORK

Background

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 55	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Learning 3D Head Pose From Synthetic Data: A Semi-Supervised Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Exploiting Depth and Intensity Information for Head Pose Estimation with Random Forests and Tensor Models
Sertan Kaymak ... Ioannis Patras
-
Sertan Kaymak, et. al.Sertan Kaymak ... Ioannis Patras
01 Jan 2013
01 Jan 2013

3D head pose estimation with convolutional neural network trained on synthetic images
Xiabing Liu ... Wei Liang
-
Xiabing Liu, et. al.Xiabing Liu ... Wei Liang
01 Sep 2016
01 Sep 2016

Learning Accurate Head Pose for Consumer Technology From 3D Synthetic Data
Shubhajit Basak ... Faisal Khan
-
Shubhajit Basak, et. al.Shubhajit Basak ... Faisal Khan
10 Jan 2021
10 Jan 2021

Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces
Felix Kuhnke ... Joern Ostermann
-
Felix Kuhnke, et. al.Felix Kuhnke ... Joern Ostermann
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning 3D Head Pose From Synthetic Data: A Semi-Supervised Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access