Abstract

Existing whole-body human pose estimation methods mostly segment the parts of the body’s hands and feet for specific processing, which not only splits the overall semantics of the body, but also increases the amount of calculation and the complexity of the model. To address these drawbacks, we designed a novel semantic–structural graph convolutional network (SSGCN) for whole-body human pose estimation tasks, which leverages the whole-body graph structure to analyze the semantics of the whole-body keypoints through a graph convolutional network and improves the accuracy of pose estimation. Firstly, we introduced a novel heat-map-based keypoint embedding, which encodes the position information and feature information of the keypoints of the human body. Secondly, we propose a novel semantic–structural graph convolutional network consisting of several sets of cascaded structure-based graph layers and data-dependent whole-body non-local layers. Specifically, the proposed method extracts groups of keypoints and constructs a high-level abstract body graph to process the high-level semantic information of the whole-body keypoints. The experimental results showed that our method achieved very promising results on the challenging COCO whole-body dataset.

Highlights

  • Human pose estimation is a challenging computer vision task, which aims to locate the human body keypoints in images and videos

  • This work presents a novel graph convolutional network framework for whole-body human pose estimation tasks, which leverages the whole-body graph structure to analyze the semantics of each part of the body through the graph convolutional network; We propose a novel heat-map-based keypoint embedding module, which encodes the position information and feature information of the keypoints of the human body; The proposed semantic–structural graph convolutional network consists of a structurebased graph layer to capture skeleton structure information and a data-dependent non-local layer to analyze the long-range grouped joint features; We represent groups of keypoints and construct a high-level abstract body graph to process the high-level semantic information of the whole-body keypoints

  • We performed the semantic fusion of whole-body poses based on the whole-body skeleton and leveraged the heat-map-based graph convolutional network to calibrate human whole-body human pose estimation

Read more

Summary

Introduction

Human pose estimation is a challenging computer vision task, which aims to locate the human body keypoints in images and videos. Our main contributions are summarized as follows: This work presents a novel graph convolutional network framework for whole-body human pose estimation tasks, which leverages the whole-body graph structure to analyze the semantics of each part of the body through the graph convolutional network; We propose a novel heat-map-based keypoint embedding module, which encodes the position information and feature information of the keypoints of the human body; The proposed semantic–structural graph convolutional network consists of a structurebased graph layer to capture skeleton structure information and a data-dependent non-local layer to analyze the long-range grouped joint features; We represent groups of keypoints and construct a high-level abstract body graph to process the high-level semantic information of the whole-body keypoints.

Human Pose Estimation
Whole-Body Pose Estimation
Heat-Map-Based Skeletal–Structural Graph Convolutional Network
Heat-Map-Based Keypoint Position Embedding
Heat-Map-Based Keypoint Feature Embedding
Skeletal–Structural Graph Convolutional Network
Structure-Based Graph Layer
Data-Dependent Non-Local Layer
Keypoint Group Representations
Keypoint-Based Pose Estimation
Loss Functions
Datasets and Metrics
Implementation Details
Experimental Results
Method
Analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call