Abstract

Automated human pose estimation is evolving as an exciting research area in human activity detection. It includes sophisticated applications such as malpractice detection in the examination, distracted driving, gesture detection, etc., and requires robust and reliable pose estimation techniques. These applications help to map the attention of the user with head pose estimation (HPE) metrics supported by emotion and gaze analysis. This paper solves the problem of attention score estimation with HPE. The proposed method ensures ease of implementation while addressing head pose estimation using 68 facial features. Further, to attain reliability and precision, head pose estimation has been implemented as a regression task. The coordinate pair angle method (CPAM) with deep neural network (DNN) regression and elastic net regression is carried out. The use of DNN ensures precision on low lighting, distorted or occluded images. CPAM methodology leverages facial landmark detection and angular difference to estimate head pose. Experimentation results showed that the proposed model could handle large datasets, real-time data processing, significant pose variations, partial occlusions, and diverse facial expressions with a mean absolute error (MAE) of 3° and less. The proposed system was evaluated on three standard databases: the 300W across large poses (300W-LP) dataset, annotated facial landmarks in the wild (AFLW2000) dataset, and the national institute of mental health child emotional faces picture set (NIMH - ChEFS ) dataset. The results achieved are on par with recent state-of-the-art methodologies such as anisotropic angle distribution learning (AADL), joint head pose estimation and face alignment algorithm (JFA), rotation axis focused attention network (RAFA-Net), and propose an MAE ranging up to 6°. The paper could achieve remarkable results for attention span prediction using head pose estimation and for many possible future applications.

Highlights

  • The need to map the attention span of users has become a necessity in recent decades in almost all spheres of the industry being education [1][2], medical [3][4], advertising [5], marketing [6], and many more

  • The proposed system was evaluated on three standard databases: the 300W across large poses (300W-LP) dataset, annotated facial landmarks in the wild (AFLW2000) dataset, and the national institute of mental health child emotional faces picture set (NIMH-ChEFS) dataset

  • It is claimed that a convolutional neural network (CNN) could be considered one of the best algorithms to work on real-world datasets [14]

Read more

Summary

Introduction

The need to map the attention span of users has become a necessity in recent decades in almost all spheres of the industry being education [1][2], medical [3][4], advertising [5], marketing [6], and many more. To face these real challenges of the digital world, research should be focused on rapid estimates of head pose angles and overcome the problems of lighting conditions [7], blurring [8], occultation [9], or environment conditions [10].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call