Abstract

This paper investigates the problem of human pose estimation (HPE) from single 2-dimensional (2D) still images using a convolutional neural network (CNN). The aim was to train the CNN to analyze a 2D input image of a person to determine the person's pose. The CNN output was given in the form of a tree-structured graph of interconnected nodes representing 2D image coordinates of the person's body joints. A new data-driven tree-based model for HPE was validated and compared to the traditional anatomy-based tree-based structures. The effect of the number of nodes in anatomy-based tree-based structures on the accuracy of HPE was examined. The tree-based techniques were compared with non-tree-based methods using a common HPE framework and a benchmark dataset. As a result of this investigation, a new hybrid two-stage approach to the HPE estimation was proposed. In the first stage, a non-tree-based network was used to generate approximate results that were then passed for further refinement to the second, tree-based stage. Experimental results showed that both of the proposed methods, the data-driven tree-based model (TD_26) and the hybrid model (H_26_2B) lead to very similar results, obtaining 1% higher HPE accuracy compared to the benchmark anatomy-based model (TA_26) and 3% higher accuracy compared to the non-tree-based benchmark (NT_26_A). The best overall HPE results were obtained using the anatomy-based benchmark with the number of nodes increased from 26 to 50, which also significantly increased the computational cost.

Highlights

  • Human pose estimation (HPE) from 2-dimensional (2D) images is the process of determining 2D locations of body parts within the image array

  • Inspired by the advantages of the ‘‘prior-knowledge’’ methods, this study investigates the integration of structured graphs representing the human pose with the convolutional neural network (CNN)

  • HPE RESULTS FOR TREE MODELS 1) A COMPARISON BETWEEN DATA-DRIVEN AND ANATOMY-BASED TREE MODELS As shown in Table 2 and Figure 8, for the same set of joints, the proposed data-driven representation (TD_26) obtained 0.9% higher HPE accuracy compared to the benchmark anatomy-based representation (TA_26 [2])

Read more

Summary

Introduction

Human pose estimation (HPE) from 2-dimensional (2D) images is the process of determining 2D locations of body parts (or joints) within the image array. The associate editor coordinating the review of this manuscript and approving it for publication was Qiang Lai. estimation requires both accurate localization of the body parts, as well as determining the correct relationship between the detected body parts. The process of determining the relationship between articulated body parts is a highly challenging task. Another important challenge of the HPE is created by the presence of occlusions between body parts. This means that some body parts can be masked by other ones, or by surrounding objects and making the HPE even more difficult. Low contrast, cluttered background, variations in scene lighting and color scheme can have a significant effect on HPE accuracy

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call