Abstract

Head pose estimation based on a single image is a challenging endeavor because of the complex background conditions and characteristics of the human face. In this report, we propose a Multi stage Regression-Capsule Network (MR-CapsNet) to predict head posture based on a single image input. In the study, we used the residual attention block and squeeze-and-excitation block to extract features in three levels. CapsNet overcomes the shortcomings of the traditional convolutional neural network and implements module aggregation to describe the spatial relationship of features after aggregation, in addition to realizing a compact and robust model using a multi-stage regression scheme. We tested our method on the AFLW2000 and BIWI datasets obtaining mean absolute errors of 4.26% and 3.95%, respectively. In addition, we discuss the accuracy of our method in the case of eye or mouth occlusion. The results of comprehensive experiments reveal that our method can accurately predict head posture.

Highlights

  • The development of a variety of perceptual devices has served as the basis for recent advancements in personalized entertainment

  • We applied the capsule structure of the network during the feature aggregation stage of head pose estimation, constructed intermediate capsules using the "vertical and horizontal sliding method Windows" to select feature information, and used the linear combination method between capsules to enhance the representative ability of capsules

  • WORK In this study, we developed a deep neural network model MR-CapsNet to predict head posture

Read more

Summary

INTRODUCTION

The development of a variety of perceptual devices has served as the basis for recent advancements in personalized entertainment. In the model-based method, Martins [23] proposed a framework to automatically estimate the pose of the human head in a single-view image. We applied the capsule structure of the network during the feature aggregation stage of head pose estimation, constructed intermediate capsules using the "vertical and horizontal sliding method Windows" to select feature information, and used the linear combination method between capsules to enhance the representative ability of capsules. The capsule neural network linearly combines the information graphs, and passes them through a dynamic routing algorithm to obtain richer feature information, which enhances the network's ability to understand the extracted facial features and reduces the impact of missing facial feature information on the prediction results. We combine the feature maps of the three stages to perform multi-stage regression to obtain the required probability vectors to improve our prediction accuracy

Feature Extraction
MULTISTAGE REGRESSION
EXPERIMENTAL RESULTS AND DISCUSSION
EXPERIMENTAL CRITERION
EXPERIMENTAL RESULT AND ANALYSIS
2) EVALUATION OF THE LABORATORY MODEL
3) EVALUATION IN THE PARTIALLY OCCLUDED CASE
CONCLUSION AND FUTURE WORK
AUTHOR CONTRIBUTIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call