Robust head pose estimation significantly improves the performance of applications related to face analysis in Cyber-Physical Systems (CPS) such as driving assistance and expression recognition. However, there exist two main challenges in this issue, i.e., the large pose variations and the property of inhomogeneous facial feature space. Head pose in large variations makes the distinguished facial features, such as nose or lips, invisible, especially in extreme cases. Additionally, features extracted from a head do not change in a stationary manner with respect to the head pose, which results in an inhomogeneous feature space. To deal with the above problems, we propose an end-to-end framework to estimate the head pose from a single depth image. To be specific, the PointNet network is adopted to automatically select distinguished facial feature points from visible surface of a head and to extract discriminative features. The Deep Regression Forest is utilized to handle the nonstationary property of the facial feature space and to learn the head pose distributions. Experimental results show that our proposed method achieves the state-of-the-art performance on the Biwi Kinect Head Pose Dataset, the Pandora Dataset and the ICT-3DHP Dataset.
Read full abstract