Abstract

Head pose estimation is one of the sensing systems needed for some intelligent surveillance, such as human behavior analysis, intelligent driver assistance, visual attention, and monitoring. These systems require accurate alignment and head movement direction prediction. The previous methods are greatly dependent on the facial landmarks and depth information. Usually, the head pose is measured by estimating several keypoints that require a correct head pose mapping to get accurate results. Moreover, facial landmarks have a detrimental effect on performance when the face is occluded or not adequately visualized. This paper proposes a method for head pose estimation of various facial conditions, such as occlusion and challenging viewpoints. We present a combination of coarse and fine feature maps classification to train a multi-loss deep Convolutional Neural Networks (CNN) to get precise Euler angles (yaw, pitch, roll) of the head position without keypoints and landmarks. Our proposed method uses more quantization units for angle classification to learn coarse and fine structure mapping for better spatial clustering features on an RGB image of a single camera. The experiments are performed on benchmark datasets and some head poses in real cases. The mean average error of prediction is 5.06°, 4.06°, and 2.96°, for the AFLW2000, AFLW, and BIWI datasets, which significantly improves the head pose estimation performance compared to the previous methods. Additionally, the proposed method outperforms previous approaches in computation time of 11 frames per second that is beneficial for real-life applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call