Abstract

The current researches trend to adopt a low-resolution hot spot map to restore the original high-resolution representation to save computing cost, resulting in unsatisfactory detection performance, especially in human body recognition with a highly complex background. Aiming at this problem, we proposed a model of parallel connection of multiple sub-networks with different resolution levels on a high-resolution main network. It can maintain the network structure of a high-resolution hot spot map in the whole operation process. By using this structure in the human key point vector field network, the accuracy of human posture recognition has been improved with high-speed operation. To validate the proposed model’s effectiveness, two common benchmark data sets of COCO key point data set and MPII human posture data set are used for evaluation. Experimental results show that our network achieves the accuracy of 72.3% AP and 92.2% AP in the two data sets, respectively, which is 3%-4% higher than those of the existing mainstream researches. In our test, only the accuracy of backbone’s SimpleBaseline with ResNet-152 is close to ours, yet our network realized a much lower computing cost.

Highlights

  • Human posture estimation is one of the important applications of deep convolution neural network in behavior perception [1]–[3]

  • Most of the mainstream human posture estimation networks used the character detector mechanism, which directly used the top-down single-person attitude estimation technology, such as the 3D-Mask R-CNN detection model proposed by He et al [7], Johnson [8], and Huang and Zhong [9]

  • The Online Pose Tracking framework proposed by Guanghan Ning et al, detected human candidate objects in the first frame and used a single-person posture classifier to track the position and posture of each candidate object

Read more

Summary

INTRODUCTION

Human posture estimation is one of the important applications of deep convolution neural network in behavior perception [1]–[3]. Such a mechanism is more likely to cause the loss of detected objects in the blurred part since most of these networks are composed of a series of main networks from high-resolution to low-resolution [13], [14] Aiming at this issue, we designed the MEPDN (MultiEnhance-Pose-Detection-Net) network, which combines the high-speed bottom-up human key point vector field detection network and parallel multi-level high-resolution network (DeepResolution-Net). The MEPDN network changes the hot spot map of symmetrical structure in the vector field detection network of human key points by multi-stage convolution to a parallel multi-stage high-resolution hot spot map with step-by-step progression, which is referred to as DeepResolution-Net in this paper. We will compare the mainstream network performance indicators, compare the network performance in various scenarios in two mainstream test sets, and achieve gratifying results in some key indicators of attitude recognition network performance

RELATED WORK
HUMAN KEY POINT DETECTION AND PAF NETWORK
EXPERIMENT
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.