Abstract

Humans can interact with several kinds of machine (motor vehicle, robots, among others) in different ways. One way is through his/her head pose. In this work, we propose a head pose estimation framework that combines 2D and 3D cues using the concept of key frames (KFs). KFs are a set of frames learned automatically offline that consist the following: 2D features, encoded through Speeded Up Robust Feature (SURF) descriptors; 3D information, captured by Fast Point Feature Histogram (FPFH) descriptors; and target’s head orientation (pose) in real-world coordinates, which is represented through a 3D facial model. Then, the KF information is re-enforced through a global optimization process that minimizes error in a way similar to bundle adjustment. The KF allows to formulate, in an online process, a hypothesis of the head pose in new images that is then refined through an optimization process, performed by the iterative closest point (ICP) algorithm. This KF-based framework can handle partial occlusions and extreme rotations even with noisy depth data, improving the accuracy of pose estimation and detection rate. We evaluate the proposal using two public benchmarks in the state of the art: (1) BIWI Kinect Head Pose Database and (2) ICT 3D HeadPose Database. In addition, we evaluate this framework with a small but challenging dataset of our own authorship where the targets perform more complex behaviors than those in the aforementioned public datasets. We show how our approach outperforms relevant state-of-the-art proposals on all these datasets.

Highlights

  • The head pose provides rich information about the emotional state, behavior, and intentionality of a person

  • Each key frame (KF) consists of a set of 3D appearance features (SURF descriptors projected to 3D world through the depth image), 3D-based features, and an approximate head pose, represented with a 3D template model

  • The variance of our KFv3 proposal is smaller in all cases, making this approach more stable

Read more

Summary

Introduction

The head pose provides rich information about the emotional state, behavior, and intentionality of a person This knowledge is useful in several areas such as humanmachine interaction [1], augmented reality [2, 3], expression recognition [4], and driver assistance [5], among others. The task of correctly estimating the head pose with non-invasive systems might seem easy, and many current devices (smartphones or webcams) can detect human faces from videos or images in real time. Those are good for recreation, but they cannot handle all the difficulties in head pose estimation (HPE) such as (self ) occlusion, extreme head poses, facial expressions, and fast movements. Some applications use 3D models [9, 10] to retrieve the pose because they

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call