Abstract
Hand pose estimation from RGB images has always been a difficult task, owing to the incompleteness of the depth information. Moon et al. improved the accuracy of hand pose estimation by using a new network, InterNet, through their unique design. Still, the network still has potential for improvement. Based on the architecture of MobileNet v3 and MoGA, we redesigned a feature extractor that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc. Using these modules effectively with our network, architecture can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.
Highlights
We evaluated our InterNet+ network on RGB datasets used for hand pose estimation, including the stereo hand pose tracking benchmark (STB) and Rendered Handpose Dataset (RHD) datasets, and the feasibility test conducted on the incomplete
We use widely used mean end point error (EPE, according to reference [3], which is defined as a mean Euclidean distance between the predicted and ground-truth 3D hand pose after root joint alignment) as the evaluation metrics for STB dataset and RHD dataset
Considering experience, the best result is generally obtained near the epoch where the learning rate is about to converge to 0; the STB result is taken from 45 epochs, and the RHD training result is taken from 189 epochs
Summary
Using deep learning methods to estimate hand pose based on a RGB image, one of the possible methods is InterNet, which was introduced by [8]. InterNet accurately estimates the posture position of the hand by inputting an annotated RGB image using a deep neural network feature extractor and subsequent heatmap estimation and position-fitting with the fully connected network. For the purpose of achieving the potential of the original InterNet structure and verify whether the current achievements can be effectively applied to the field of hand pose estimation, as well as improving the performance in multiple datasets, we relied on the recent developments in this field to update the original method and achieved greater improvements on multiple datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have