Abstract

Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.

Highlights

  • Hands play an important role in people’s daily activities

  • Failure cases are shown in Figure 9f: A green box covers its corresponding red box, but the intersection over union (IoU) between these two boxes is low, so that it is counted as a false detection; a red box is not covered by any green box, and it is counted as a missing detection

  • This paper presents a robust and accurate approach for 3D hand detection from a single

Read more

Summary

Introduction

Hand detection is a key component in many computer vision applications, such as human–computer interaction [1], hand pose estimation [2,3,4], hand gesture recognition [5,6], activity analysis [5], and so on. Most existing works [7,8,9,10,11,12] focus on 2D hand detection from a single RGB image which lacks 3D information and leads to incompetency for 3D hand detection. RGB image based methods cannot meet the increasing requirement of 3D human–computer/robot interaction [13]. In a robotic teaching scenario, there would be ambiguities in inferring the target from a single RGB image (see Figure 1a). For other related technologies such as the estimation of hand joints and action recognition, please refer to our previous work [2,3,14].)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call