Abstract

Human pose estimation (HPE) from the monocular image has been a recent hot topic which has great potential applications on health, safety and security monitoring. However, the estimation accuracy and the requirement of large scale training dataset are two main challenges in this area. In this paper, we introduce a depth guided auto-encoder (AE) network for human pose estimation from the monocular image. Specifically, the Image-to-Depth Auto-Encoder of the network is obtained during the pre-train stage using depth data. The embedding learnt from this AE is then used to train parameters of the decoder in the network for pose estimation. The learned AE from depth data aims to increase the human pose estimation accuracy. Furthermore, we can pre-train the AE using depth information from synthetic data which greatly reduces the need for training data. For evaluation, the proposed approach is applied to SURREAL which is a synthetic dataset. Our depth guided auto-encoder network outperforms the one without it. Experimental results demonstrate the effectiveness of the approach in this paper.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call