Abstract

Human pose estimation in continuous video frames captured from complex coal mine scenes is challenging. The video frames in this scene may suffer from problems such as uneven brightness, blurred image details, and excessive noise. Mainstream pose estimation methods have good detection performance for high-quality static images, but their accuracy and prediction rate can be significantly reduced in coal mine scenes. In this work, a yielding human pose estimation framework is proposed, termed YH-Pose. The framework strives to incorporate additional visual evidences from neighboring frames to facilitate the pose estimation of the current frame. Firstly, a human detector is introduced to locate the person’s position in the video frame and extract global features. This can provide a good initialization for the latter keypoint detection, making the training process converge quickly. Secondly, heatmaps are used to encode the joint locations as Gaussian peaks, and a temporal road module (TRM) is designed, which encodes video frames at intervals. The module efficiently fuses spatio-temporal information through frame-rate groups in a hierarchical manner. Lastly, the spatial road module (SRM) learns from fused keypoint context features and predicts the location of the keypoints in the next frame. In addition, a dataset called Colliery-1 is proposed, which derives from underground surveillance video from chinese coal mines and consists of 3600 video clips. The experimental results on the Colliery-1 dataset indicate that the framework achieved an average accuracy of 82% and 80% on the training and test sets, respectively. Moreover, the framework achieved a 94.2% prediction rate for pose estimation. To further evaluate the effectiveness of the proposed method, some comparisons have been made between it and various mainstream methods using different metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call