Multi-Level Fusion Net for hand pose estimation in hand-object interaction

Xiang-Bo Lin,Yi-Dan Zhou,Kuo Du,Yi Sun,Xiao-Hong Ma,Jian Lu

doi:10.1016/j.image.2021.116196

Abstract

This work is about solving a challenging problem of estimating the full 3D hand pose when a hand interacts with an unknown object. Compared to isolated single hand pose estimation, occlusion and interference induced by the manipulated object and the clutter background bring more difficulties for this task. Our proposed Multi-Level Fusion Net focuses on extracting more effective features to overcome these disadvantages by multi-level fusion design with a new end-to-end Convolutional Neural Network (CNN) framework. It takes cropped RGBD data from a single RGBD camera at free viewpoint as input without requiring additional hand–object pre-segmentation and object or hand pre-modeling. Through extensive evaluations on public hand–object interaction dataset, we demonstrate the state-of-the-art performance of our method.

Full Text