Abstract

Head pose estimation is a crucial initial task for human face analysis, which is employed in several computer vision systems, such as: facial expression recognition, head gesture recognition, yawn detection, etc. In this work, we propose a frame-based approach to estimate the head pose on top of the Viola and Jones (VJ) Haar-like face detector. Several appearance and depth-based feature types are employed for the pose estimation, where comparisons between them in terms of accuracy and speed are presented. It is clearly shown through this work that using the depth data, we improve the accuracy of the head pose estimation. Additionally, we can spot positive detections, faces in profile views detected by the frontal model, that are wrongly cropped due to background disturbances. We introduce a new depth-based feature descriptor that provides competitive estimation results with a lower computation time. Evaluation on a benchmark Kinect database shows that the histogram of oriented gradients and the developed depth-based features are more distinctive for the head pose estimation, where they compare favorably to the current state-of-the-art approaches. Using a concatenation of the aforementioned feature types, we achieved a head pose estimation with average errors not exceeding for pitch, yaw and roll angles, respectively.

Highlights

  • Head pose estimation is considered as the first step in several computer vision systems, such as: facial expression recognition, face recognition, head gesture recognition, gaze recognition, driver monitoring, etc

  • Temporal-dependent approaches strongly rely on the initializing step, where most of them assume that the tracking starts from the frontal pose with approximately zero rotation angles. This assumption does not always hold true in real scenarios and would cause a fixed offset error. Some of these approaches employ the frontal model of the Viola and Jones (VJ) face detector [26] to start the tracking with zero rotation angles; this detector is capable of detecting faces across a wide range of poses

  • Head pose estimation is crucial for many advanced facial analysis tasks in various computer vision systems, such as: facial expression recognition, head gesture recognition, gaze recognition and driver monitoring

Read more

Summary

Introduction

Head pose estimation is considered as the first step in several computer vision systems, such as: facial expression recognition, face recognition, head gesture recognition, gaze recognition, driver monitoring, etc. Considering different poses, Moore and Bowden [2] developed a texture-based approach to perform a multi-view facial expression recognition. Gaussian process regression (CSGPR) model for head pose normalization to perform pose-invariant facial expression recognition. A continuous estimation of the head pose over an image sequence is an essential task for head gesture recognition. Throughout the last two decades, a number of approaches was proposed to tackle the face pose estimation from 2D/3D facial data. Those approaches can be categorized according to several criteria, such as: temporal dependency, estimation continuity, data source, etc

Temporal Dependency
Data Source
The Proposed Approach
Face Detection
Feature Extraction
Appearance-Based Features
Depth-Based Features
Machine Learning Approach
Experimental Results
Experiment 1
Experiment 2
Processing Time
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call