Abstract

The application of 2D pose estimation methods often suffers from performance degeneration because of the severe video quality degradation. To mitigate the problem, a novel model is proposed, namely spatiotemporal net (STNet). STNet utilizes convolution modules to extract the 2D joint heatmaps of each frame and exploits recurrent convolution modules to encode the time information between the adjacent frames. This decoupling learning of spatiotemporal information improves the temporal coherence and spatial accuracy of the estimated poses and reduces the difficulty of extracting spatiotemporal features. The application of ConvGRU effectively reduces the computational cost while ensuring recognition accuracy. Proposed model is compared with other existing methods on two benchmarks: Penn Action and Sub-JHMDB. The results show that STNet can better trade-off prediction performance and computational complexity and have more practical value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call