Spatiotemporal Neural Network for Video-Based Pose Estimation

Bin Ji,Xubo Yang,Ye Pan,Xiaogang Jin

doi:10.3724/sp.j.1089.2022.18878

Abstract

The application of 2D pose estimation methods often suffers from performance degeneration because of the severe video quality degradation. To mitigate the problem, a novel model is proposed, namely spatiotemporal net (STNet). STNet utilizes convolution modules to extract the 2D joint heatmaps of each frame and exploits recurrent convolution modules to encode the time information between the adjacent frames. This decoupling learning of spatiotemporal information improves the temporal coherence and spatial accuracy of the estimated poses and reduces the difficulty of extracting spatiotemporal features. The application of ConvGRU effectively reduces the computational cost while ensuring recognition accuracy. Proposed model is compared with other existing methods on two benchmarks: Penn Action and Sub-JHMDB. The results show that STNet can better trade-off prediction performance and computational complexity and have more practical value.

Full Text