Abstract

Existing human pose estimation methods usually have a high computational load, which is very unfavorable for resource-limited equipment. To address this issue, we propose a low computational-cost deep supervision pyramid network called DSPNet. Firstly, we design a lightweight up-sampling unit instead of transposed convolution as a decoder for the network. In the case of decreased computation, it has brought an increase in prediction accuracy. Secondly, we present a novel deep supervision pyramid architecture to improve the multi-scale obtaining ability of MSRA SimpleBaseline while not bringing any increase in the number of parameters. The experimental results on both MPII and COCO pose estimation benchmarks illustrate that DSPNet achieves almost equivalent state-of-the-art results with a low computational load. Especially, the computational cost of DSPNet is 12.7% of SimpleBaseline and the estimation accuracy is improved by 0.9 points when both methods use the same backbone network (EfficientNet) on MPII validation set. The code of the proposed method is availabe at https://github.com/sumaliqinghua/DSPNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call