Abstract

Human head pose estimation from images plays a vital role in applications like driver assistance systems and human behavior analysis. Head pose estimation networks are typically trained in a supervised manner. Unfortunately, manual/sensor-based annotations of head poses are prone to errors. A solution is supervised training on synthetic training data generated from 3D face models which can provide an infinite amount of perfect labels. However, computer generated face images only provide an approximation of real-world images which results in a domain gap between training and application domain. To date, domain adaptation is rarely addressed in current work on head pose estimation. In this work we propose relative pose consistency, a semi-supervised learning strategy for head pose estimation based on consistency regularization. It allows simultaneous learning on labeled synthetic data and unlabeled real-world data to overcome the domain gap, while keeping the advantages of synthetic data. Consistency regularization enforces consistent network predictions under random image augmentations. We address pose-preserving and pose-altering augmentations. Naturally, pose-altering augmentations cannot be used on unlabeled data. We therefore propose a strategy to exploit the relative pose introduced by pose-altering augmentations between augmented image pairs. This allows the network to benefit from relative pose labels during training on the unlabeled, real-world images. We evaluate our approach on a widely used benchmark (Biwi Kinect Head Pose) and outperform domain-adaptation SOTA. We are the first to present a consistency regularization framework for head pose estimation. Our experiments show that our approach improves head pose estimation accuracy for real-world images despite using only labels from synthetic images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call