Abstract
Accurate head pose estimation from 2D image data is an essential component of applications such as driver monitoring systems, virtual reality technology, and human-computer interaction. It enables a better determination of user engagement and attentiveness. The most accurate head pose estimators are based on Deep Neural Networks that are trained with the supervised approach and rely primarily on the accuracy of training data. The acquisition of real head pose data with a wide variation of yaw, pitch and roll is a challenging task. Publicly available head pose datasets have limitations with respect to size, resolution, annotation accuracy and diversity. In this work, a methodology is proposed to generate pixel-perfect synthetic 2D headshot images rendered from high-quality 3D synthetic facial models with accurate head pose annotations. A diverse range of variations in age, race, and gender are also provided. The resulting dataset includes more than 300k pairs of RGB images with corresponding head pose annotations. A wide range of variations in pose, illumination and background are included. The dataset is evaluated by training a state-of-the-art head pose estimation model and testing against the popular evaluation-dataset Biwi. The results show that training with purely synthetic data generated using the proposed methodology achieves close to state-of-the-art results on head pose estimation which are originally trained on real human facial datasets. As there is a domain gap between the synthetic images and real-world images in the feature space, initial experimental results fall short of the current state-of-the-art. To reduce the domain gap, a semi-supervised visual domain adaptation approach is proposed, which simultaneously trains with the labelled synthetic data and the unlabeled real data. When domain adaptation is applied, a significant improvement in model performance is achieved. Additionally, by applying a data fusion-based transfer learning approach, better results are achieved than previously published work on this topic.
Highlights
Head Pose Estimation (HPE) continues to be an active area of research in the computer vision (CV) domain because of its diverse application across a range of CV technologies
The only work that deals with domain adaptation on the regression task, on HPE, is proposed by Kuhnke and Ostermann [42], which reduces the negative transfer from the source outliers through generating source sampler weights during training and propose Partial Adversarial Domain Adaptation for Continuous label spaces (PADACO)
EVALUATION OF THE DATA first, the details of the state-of-the-art model that is used in this work to evaluate the effectiveness of the generated synthetic data are discussed including the domain adaptation module that is added to the existing model architecture
Summary
Head Pose Estimation (HPE) continues to be an active area of research in the computer vision (CV) domain because of its diverse application across a range of CV technologies. Published works use different modalities like depth information [2]–[5], inertial measurement unit (IMU) [6] or video sequences [7] as a cue to map the features extracted from the 2D image to the 3D space These methods require more computation and different sensors which are not always available. Generating synthetic facial images through Computer Graphics (CG) Software provides an inexpensive and sufficient amount of accurately labelled data with a comparatively low effort and complexity as the head models, camera parameters and positions, scene illuminations and other constraints can be controlled within the 3D environment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.