Abstract

Hard-joint localization in human pose estimation is a challenging task for some reasons, such as the disappearance of joint points caused by clothing and lighting, the shelter caused by complex environment and the destruction of dependence among each joint point. A majority of existing approaches for hard-joint pose estimation achieve high accuracy by obtaining more high-level feature information. However, most networks suffer from information loss, which is caused by down-sampling. This would result in the loss of joint location. The compensation of information loss introduces useless information to network learning, affecting the extraction of useful information associated with hard joints. Herein, a residual down-sampling module is proposed to replace the pooling layer for down-sampling and fuse high-level features with low-resolution feature maps. This module aims to address the information loss issue. A strategy to guide network learning based on the attention mechanism is proposed, which makes the network focus on useful feature information. A convolutional block attention module is combined with a residual module outside the basic sub-network. The network can learn more effective high-level features. An eight-stack hourglass is used as the basic network, and the proposed method is validated on the MPII and LSP Human Pose dataset. Compared with eight-stack hourglass and HRNet, the proposed method achieves higher accuracy for hard-joint localization. The experimental results show our proposed methods effective for hard-joint localization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call