Abstract

The accurate estimation of 3D human pose is of great importance in many fields, such as human-computer interaction, motion recognition and automatic driving. We take 2D images as our research object in this paper, and propose a 3D pose estimation model called Pose ResNet. First, the model uses ResNet50 as the base network and introduces the attention mechanism CBAM to extract features. Then, a waterfall atrous spatial pooling (WASP) module is used to capture multi-scale contextual information from the extracted features to increase the receptive field. Finally, the features are input into a deconvolution network to acquire the volume heat map H, which is later processed by a soft argmax function to obtain the coordinates of the joints. The results show that the mean per joint position error (MPJPE) is 51.8mm, compared with other approaches, our method achieves better results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call