Unified End-to-End YOLOv5-HR-TCM Framework for Automatic 2D/3D Human Pose Estimation for Real-Time Applications.

Hung-Cuong Nguyen,Thi-Hao Nguyen,Van-Hung Le,Rafal Scherer

doi:10.3390/s22145419

Hung-Cuong Nguyen, Thi-Hao Nguyen + Show 2 more

Open Access

https://doi.org/10.3390/s22145419

Copy DOI

Abstract

Three-dimensional human pose estimation is widely applied in sports, robotics, and healthcare. In the past five years, the number of CNN-based studies for 3D human pose estimation has been numerous and has yielded impressive results. However, studies often focus only on improving the accuracy of the estimation results. In this paper, we propose a fast, unified end-to-end model for estimating 3D human pose, called YOLOv5-HR-TCM (YOLOv5-HRet-Temporal Convolution Model). Our proposed model is based on the 2D to 3D lifting approach for 3D human pose estimation while taking care of each step in the estimation process, such as person detection, 2D human pose estimation, and 3D human pose estimation. The proposed model is a combination of best practices at each stage. Our proposed model is evaluated on the Human 3.6M dataset and compared with other methods at each step. The method achieves high accuracy, not sacrificing processing speed. The estimated time of the whole process is 3.146 FPS on a low-end computer. In particular, we propose a sports scoring application based on the deviation angle between the estimated 3D human posture and the standard (reference) origin. The average deviation angle evaluated on the Human 3.6M dataset (Protocol #1–Pro #1) is 8.2 degrees.

Full Text