Abstract

The cross-view 3D human pose estimation model has made significant progress, it better completed the task of human joint positioning and skeleton modeling in 3D through multi-view fusion method. The multi-view 2D pose estimation part of this model is very important, but its training cost is also very high. It uses some deep learning networks to generate heatmaps for each view. Therefore, in this article, we tested some new deep learning networks for pose estimation tasks. These deep networks include Mobilenetv2, Mobilenetv3, Efficientnetv2 and Resnet. Then, based on the performance and drawbacks of these networks, we built multiple deep learning networks with better performance. We call our network in this article LHPE-nets, which mainly includes Low-Span network and RDNS network. LHPE-nets uses a network structure with evenly distributed channels, inverted residuals, external residual blocks and a framework for processing small-resolution samples to achieve training saturation faster. And we also designed a static pose sample simplification method for 3D pose data. It implemented low-cost sample storage, and it was also convenient for models to read these samples. In the experiment, we used several recent models and two public estimation indicators. The experimental results show the superiority of this work in fast start-up and network lightweight, it is about 1-5 epochs faster than the Resnet-34 during training. And they also show the accuracy improvement of this work in estimating different joints, the estimated performance of approximately 60% of the joints is improved. Its performance in the overall human pose estimation exceeds other networks by more than 7mm. The experiment analyzes the network size, fast start-up and the performance in 2D and 3D pose estimation of the model in this paper in detail. Compared with other pose estimation models, its performance has also reached a higher level of application.

Highlights

  • The Resnet series network [1] has already obtained mature applications in many fields

  • In the estimation of human pose, this Resnet series network is superior in training speed and effectiveness due to its residual network

  • In our experimental tests, Lightweight 2D and 3D human pose estimation network and pose sample simplification method the video memory occupied by the network in this article is not smaller than other networks

Read more

Summary

Introduction

The Resnet series network [1] has already obtained mature applications in many fields. In the estimation of human pose, this Resnet series network is superior in training speed and effectiveness due to its residual network. The Mobilenet series network [2] uses the inverted residual to extract more refined features by expanding the dimension of the tensor. The structure of the Efficientnetv2 [3] network is lighter. The Resnet network has a relatively large number of parameters, while the Mobilenet and Efficientnetv networks are not satisfactory in terms of fast start-up. These networks have room for improvement in pose estimation performance. We used these networks as experimental comparisons to reflect the superiority of the network designed in this paper in terms of network size and estimation performance

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.