Abstract

This paper introduces a novel method for real-time 2D hand pose estimation from monocular color images, which is named as SRHandNet. Existing methods can not time efficiently obtain appropriate results for small hand. Our key idea is to simultaneously regress the hand region of interests (RoIs) and hand keypoints for a given color image, and iteratively take the hand RoIs as feedback information for boosting the performance of hand keypoints estimation with a single encoder-decoder network architecture. Different from previous region proposal network (RPN), a new lightweight bounding box representation, which is called region map, is proposed. The proposed bounding box representation map together with hand keypoints heatmaps are combined into the unified multi-channel feature maps, which can be easily acquired with only one forward network inference and thus improve the runtime efficiency of the network. Our proposed SRHandNet can run at 40fps for hand bounding box detection and up to 30fps accurate hand keypoints estimation under the desktop environment without implementation optimization. Experiments demonstrate the effectiveness of the proposed method. State-of-the-art results are also achieved out competing all recent methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call