Full-Range Yaw Prediction: A Multi-View Approach for 3D Head Model Pose Estimation Using Convolutional Neural Networks

Álvaro Heredia-Lidón,Xavier Sevillano,Neus Martínez-Abadías

doi:10.3233/faia230662

Abstract

Head pose estimation, a crucial task in computer vision, involves determining the orientation of a person’s head in 3D space through yaw, pitch, and roll angles. While recent techniques present excellent results in estimating head pose from a single 2D RGB image when the head faces the camera directly, few methods exist for pose estimation from arbitrary viewpoints. This problem is emphasised when the input data is in 3D, such as heads reconstructed models from magnetic resonances, where an accurate estimation of the pose is necessary for diagnostic purposes. To overcome these limitations, we make a first step by proposing a method for fine-grained head pose estimation across the full-range of yaw angles using 3D head synthetic models. Our approach involves transforming the 3D pose estimation problem into a multi-class 2D image classification problem by representing 3D head models as multi-view projection images. Leveraging a fine-tuned ResNet50 convolutional neural network, we tackle the task of head pose estimation with fine granularity of 5°, effectively discretizing the 360° yaw orientations. For the evaluation of our proposal, we train and test our models with the publicly available FaceScape and 3D BIWI datasets obtaining promising results.

Full Text