Abstract

Hand pose estimation from RGB images has always been a difficult task, owing to the incompleteness of the depth information. Moon et al. improved the accuracy of hand pose estimation by using a new network, InterNet, through their unique design. Still, the network still has potential for improvement. Based on the architecture of MobileNet v3 and MoGA, we redesigned a feature extractor that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc. Using these modules effectively with our network, architecture can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.

Highlights

  • We evaluated our InterNet+ network on RGB datasets used for hand pose estimation, including the stereo hand pose tracking benchmark (STB) and Rendered Handpose Dataset (RHD) datasets, and the feasibility test conducted on the incomplete

  • We use widely used mean end point error (EPE, according to reference [3], which is defined as a mean Euclidean distance between the predicted and ground-truth 3D hand pose after root joint alignment) as the evaluation metrics for STB dataset and RHD dataset

  • Considering experience, the best result is generally obtained near the epoch where the learning rate is about to converge to 0; the STB result is taken from 45 epochs, and the RHD training result is taken from 189 epochs

Read more

Summary

Introduction

Using deep learning methods to estimate hand pose based on a RGB image, one of the possible methods is InterNet, which was introduced by [8]. InterNet accurately estimates the posture position of the hand by inputting an annotated RGB image using a deep neural network feature extractor and subsequent heatmap estimation and position-fitting with the fully connected network. For the purpose of achieving the potential of the original InterNet structure and verify whether the current achievements can be effectively applied to the field of hand pose estimation, as well as improving the performance in multiple datasets, we relied on the recent developments in this field to update the original method and achieved greater improvements on multiple datasets.

Related Work
Original InterNet
Network Structure
Specific
Redesigned Feature Extraction Network
Inverted Residual Block
Coordinate
Feature
Processing of the Feature Maps
Effective Way of Network Training
Experiment
Datasets
STB Dataset
RHD Dataset
Experimental Environment and Results
Methods
Convergence
Coordinate Attention Mechanism Module
Processing of Feature Map by Using the FcaNet Layer
Discussion and Outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call