Abstract

The top-down human pose estimation method usually faces the following problems: (i) The target detection result is not well applied in the pose estimation network. (ii) Difficulty of human detection in the crowded state. (iii) The complicated model leads to a long training time. Aiming at the issues above, a lightweight multi-person pose estimation method based on symmetric transformation and global matching is proposed. Symmetric transformation module adds spatial transformation network(STN) and spatial de-transformer network(SDTN) before and after the single-person pose estimation(SPPE) to extract high-quality single-person pose regions from inaccurate human candidate frames. Global matching method is used to transform the key point prediction problem into the optimal matching problem of the human body-key point graph, and solve the questions of misjudgment and error detection of pose estimation in the crowded state. Finally, depth wise separable convolution and inverted residual model are used to reduce the complexity of model, so as to improve the running speed while balancing the accuracy of the algorithm. Experiments show that the algorithm proposed in this paper not only enhance the overall performance of the multi-person posture estimation network in the crowded state, but also improves the running speed significantly, which further confirms the effectiveness and competitiveness of this algorithm.

Highlights

  • Human pose estimation [1], [40],one of the basic research tasks of computer vision [2], has always attracted the attention of researchers, which is widely used in security monitoring, movie action special effects modeling, human behavior prediction tasks, human-computer interaction, virtual reality etc

  • One is the top-down research idea: firstly, each person is detected by the target detector to obtain the candidate frame of the human body, and the candidate frame of the human body is sent to the single-person pose estimation network to obtain the result

  • (2) Aiming at the problem of difficult human body detection in crowded crowds, this paper proposes a global matching method by optimizing the loss function to solve the problems of misjudgment and missed detection of key points in the crowded state

Read more

Summary

INTRODUCTION

Human pose estimation [1], [40],one of the basic research tasks of computer vision [2], has always attracted the attention of researchers, which is widely used in security monitoring, movie action special effects modeling, human behavior prediction tasks, human-computer interaction, virtual reality etc. PROPOSED APPROACH a lightweight multi-person pose estimation model based on symmetric transformation will be introduced, and the global matching method is used to transform the key point prediction problem into the optimal matching problem of the human body-key point graph, which further improves the overall performance of the model. The fourth step is to match the human body-key point graph with the global matching method to obtain the final multi-person pose estimation result This module uses the idea of spatial transformation network(STN) [11] to extract high-quality human candidate frames. According to the above problems, this paper designs the global matching method to transform the key point prediction problem into an optimal matching problem of the human body-key point graph based on KM algorithm, and the process is as follows in Algorithm 1: 1) Loss function The traditional human body pose estimation network relies heavily on the results of human body detection. If a key point vkj contains a candidate key point from the human body point hi, an edge eki,j is established

3) Objective function
Method
IMPLEMENTATION DETAILS
PERFORMANCE EVALUATION
ABLATION STUDY
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call