Abstract

Head pose estimation is a fundamental function for several applications in human–computer interactions. Accurate six degrees of freedom head pose estimation (6DoF-HPE) with full-range angles make up most of these applications, which require sequential images of the human head as input. Most existing head pose estimation methods focus on a three degrees of freedom (3DoF) frontal head, which restricts their applications in real-world scenarios. This study presents a framework designed to estimate a head pose without landmark localization. The novelty of our framework is to estimate the 6DoF head poses under full-range angles in real-time. The proposed framework leverages deep neural networks to detect human heads and predict their angles using single shot multibox detector (SSD) and RepVGG-b1g4 backbone, respectively. This work uses red, green, blue, and depth (RGB-D) data to estimate the rotational and translational components relative to the camera pose. The proposed framework employs a continuous representation to predict the angles and a multi-loss approach to update the loss functions for the training strategy. The regression function combines the geodesic loss with the mean squared error. The ground-truth labels were extracted from the public dataset Carnegie Mellon university (CMU) Panoptic for full head angles. This study provides a comprehensive comparison with state-of-the-art methods using public benchmark datasets. Experiments demonstrate that the proposed method achieves or outperforms state-of-the-art methods. The code and datasets are available at: (https://github.com/Redhwan-A/6DoFHPE).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call