Abstract

This paper proposes a cascaded parameter-parsimonious 3D hand pose estimation strategy to improve real-time performance without sacrificing accuracy. The estimation process is first decomposed into feature extraction and feature exploitation. The feature extraction is seen as a dimension reduction process, where convolutional neural networks (CNNs) are used to ensure accuracy. Feature exploitation is considered as a policy optimization process, and a shallow reinforcement learning (RL)-based feature exploitation module is proposed to improve running rapidity. Ablation studies and experiments are carried out on NYU and ICVL datasets to evaluate the performance of the strategy, and multiple baselines are used to evaluate generalization. The results show that the improvement on testing time reaches 8.1 $$\%$$ and 14.6 $$\%$$ by the proposed strategy. Note that the overall accuracy also reaches state-of-the-art, which further shows the effectiveness of the proposed strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call