Abstract

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.

Highlights

  • Gesture estimation plays a significant role in computer science, and related tasks aim toward understanding human gestures through algorithms

  • We introduce estimation methods based on depth images and RGB images in this chapter

  • Based on 3D hand estimation from a single RGB image, we propose a method that uses the attention mechanism to fuse the 2D score map and the RGB image channel

Read more

Summary

Introduction

Gesture estimation plays a significant role in computer science, and related tasks aim toward understanding human gestures through algorithms. Human–computer interaction (HCI) can be implemented wherever and whenever, has fewer constraints, and enables computers to efficiently and precisely understand user commands without any mechanical assistance. Gestures for HCI are quick, vivid, intuitive, flexible, and visual; they can enable soundless interactions and bridge the gap between the real world and virtual worlds. Computer-vision-based hand pose estimation enables people to communicate with machines more naturally. With the development of computer vision, pose estimation no longer relies on traditional wearable devices in specific scenes but can be directly implemented based on image recognition. The research on pose estimation in computer vision includes three main categories: depth images, multivision RGB images, and single RGB images

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call