Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments

Chi Xu,Yi Liu,Wendi Cai,Yunkai Jiang,Jun Zhou,Yongbo Li

doi:10.3390/s20216360

Abstract

Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.

Highlights

Hands play an important role in people’s daily activities
Failure cases are shown in Figure 9f: A green box covers its corresponding red box, but the intersection over union (IoU) between these two boxes is low, so that it is counted as a false detection; a red box is not covered by any green box, and it is counted as a missing detection
This paper presents a robust and accurate approach for 3D hand detection from a single

Summary

Introduction

Hand detection is a key component in many computer vision applications, such as human–computer interaction [1], hand pose estimation [2,3,4], hand gesture recognition [5,6], activity analysis [5], and so on. Most existing works [7,8,9,10,11,12] focus on 2D hand detection from a single RGB image which lacks 3D information and leads to incompetency for 3D hand detection. RGB image based methods cannot meet the increasing requirement of 3D human–computer/robot interaction [13]. In a robotic teaching scenario, there would be ambiguities in inferring the target from a single RGB image (see Figure 1a). For other related technologies such as the estimation of hand joints and action recognition, please refer to our previous work [2,3,14].)

Objectives

Methods

Results

Conclusion