Abstract
Head pose estimation based on a single image is a challenging endeavor because of the complex background conditions and characteristics of the human face. In this report, we propose a Multi stage Regression-Capsule Network (MR-CapsNet) to predict head posture based on a single image input. In the study, we used the residual attention block and squeeze-and-excitation block to extract features in three levels. CapsNet overcomes the shortcomings of the traditional convolutional neural network and implements module aggregation to describe the spatial relationship of features after aggregation, in addition to realizing a compact and robust model using a multi-stage regression scheme. We tested our method on the AFLW2000 and BIWI datasets obtaining mean absolute errors of 4.26% and 3.95%, respectively. In addition, we discuss the accuracy of our method in the case of eye or mouth occlusion. The results of comprehensive experiments reveal that our method can accurately predict head posture.
Highlights
The development of a variety of perceptual devices has served as the basis for recent advancements in personalized entertainment
We applied the capsule structure of the network during the feature aggregation stage of head pose estimation, constructed intermediate capsules using the "vertical and horizontal sliding method Windows" to select feature information, and used the linear combination method between capsules to enhance the representative ability of capsules
WORK In this study, we developed a deep neural network model MR-CapsNet to predict head posture
Summary
The development of a variety of perceptual devices has served as the basis for recent advancements in personalized entertainment. In the model-based method, Martins [23] proposed a framework to automatically estimate the pose of the human head in a single-view image. We applied the capsule structure of the network during the feature aggregation stage of head pose estimation, constructed intermediate capsules using the "vertical and horizontal sliding method Windows" to select feature information, and used the linear combination method between capsules to enhance the representative ability of capsules. The capsule neural network linearly combines the information graphs, and passes them through a dynamic routing algorithm to obtain richer feature information, which enhances the network's ability to understand the extracted facial features and reduces the impact of missing facial feature information on the prediction results. We combine the feature maps of the three stages to perform multi-stage regression to obtain the required probability vectors to improve our prediction accuracy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have