Abstract

Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call