Abstract

Human pose estimation from video sequences has become a hot research topic in the domain of robotics and computer vision. However, existing three-dimensional (3D) pose estimation methods usually analyze individual frames, which have a low accuracy due to various human movement speed, limiting its practical application. In this paper, we propose a method for estimating 3D pose and calculating similarity from Tai Chi video sequences based on Seq2Seq network. Specifically, using 2D joint point coordinate sequence of the original image as input, our method constructs an encoder and a decoder to build a Seq2Seq network. Our method introduces an attention mechanism for weighing the input data to obtain an intermediate vector and decode it to estimate the 3D joint point sequence. Afterwards, using a template video and a target video as input, our method calculates the cost of passing through each point within the constraints to construct a cost matrix for video similarity. With the cost matrix, our method can determine the optimal path and use the correspondence of the video sequence to calculate the image similarity of the corresponding frame. The experimental data show that the proposed method can effectively improve the accuracy of 3D pose estimation, and increase the speed for video similarity calculation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call