LHPE-nets: A lightweight 2D and 3D human pose estimation model with well-structural deep networks and multi-view pose sample simplification method.

Hao Zhang,Li-yan Dong,Ming-hui Sun,Nguyen Quoc Khanh Le,Hao Wang

doi:10.1371/journal.pone.0264302

Hao Zhang, Li-yan Dong + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0264302

Copy DOI

Abstract

The cross-view 3D human pose estimation model has made significant progress, it better completed the task of human joint positioning and skeleton modeling in 3D through multi-view fusion method. The multi-view 2D pose estimation part of this model is very important, but its training cost is also very high. It uses some deep learning networks to generate heatmaps for each view. Therefore, in this article, we tested some new deep learning networks for pose estimation tasks. These deep networks include Mobilenetv2, Mobilenetv3, Efficientnetv2 and Resnet. Then, based on the performance and drawbacks of these networks, we built multiple deep learning networks with better performance. We call our network in this article LHPE-nets, which mainly includes Low-Span network and RDNS network. LHPE-nets uses a network structure with evenly distributed channels, inverted residuals, external residual blocks and a framework for processing small-resolution samples to achieve training saturation faster. And we also designed a static pose sample simplification method for 3D pose data. It implemented low-cost sample storage, and it was also convenient for models to read these samples. In the experiment, we used several recent models and two public estimation indicators. The experimental results show the superiority of this work in fast start-up and network lightweight, it is about 1-5 epochs faster than the Resnet-34 during training. And they also show the accuracy improvement of this work in estimating different joints, the estimated performance of approximately 60% of the joints is improved. Its performance in the overall human pose estimation exceeds other networks by more than 7mm. The experiment analyzes the network size, fast start-up and the performance in 2D and 3D pose estimation of the model in this paper in detail. Compared with other pose estimation models, its performance has also reached a higher level of application.

Highlights

The Resnet series network [1] has already obtained mature applications in many fields
In the estimation of human pose, this Resnet series network is superior in training speed and effectiveness due to its residual network
In our experimental tests, Lightweight 2D and 3D human pose estimation network and pose sample simplification method the video memory occupied by the network in this article is not smaller than other networks

Summary

Introduction

The Resnet series network [1] has already obtained mature applications in many fields. In the estimation of human pose, this Resnet series network is superior in training speed and effectiveness due to its residual network. The Mobilenet series network [2] uses the inverted residual to extract more refined features by expanding the dimension of the tensor. The structure of the Efficientnetv2 [3] network is lighter. The Resnet network has a relatively large number of parameters, while the Mobilenet and Efficientnetv networks are not satisfactory in terms of fast start-up. These networks have room for improvement in pose estimation performance. We used these networks as experimental comparisons to reflect the superiority of the network designed in this paper in terms of network size and estimation performance

Objectives

Methods

Findings

Conclusion