Abstract

Hand segmentation is one of the most fundamental and crucial steps for egocentric human-computer interaction. The special egocentric view brings new challenges to hand segmentation tasks, such as the unpredictable environmental conditions. The performance of traditional hand segmentation methods depend on abundant manually labeled training data. However, these approaches do not appropriately capture the whole properties of egocentric human-computer interaction for neglecting the user-specific context. It is only necessary to build a personalized hand model of the active user. Based on this observation, we propose an online-learning hand segmentation approach without using manually labeled data for training. Our approach consists of top-down classifications and bottom-up optimizations. More specifically, we divide the segmentation task into three parts, a frame-level hand detection which detects the presence of the interactive hand using motion saliency and initializes hand masks for online learning, a superpixel-level hand classification which coarsely segments hand regions from which stable samples are selected for next level, and a pixel-level hand classification which produces a fine-grained hand segmentation. Based on the pixel-level classification result, we update the hand appearance model and optimize the upper layer classifier and detector. This online-learning strategy makes our approach robust to varying illumination conditions and hand appearances. Experimental results demonstrate the robustness of our approach.

Highlights

  • The first-person camera embedded wearable computer, such as augmented reality headset and smart glasses, is growing vigorously and urgently requires suitable interaction patterns for egocentric vision

  • 5 Results and discussion We evaluate our cascaded hand segmentation method on two types of egocentric data which correspond to different levels of human-computer interaction

  • 6 Conclusions In this paper, we presented an unsupervised on-the-fly hand segmentation method which consists of top-down classification and bottom-up optimization

Read more

Summary

Introduction

The first-person camera embedded wearable computer, such as augmented reality headset and smart glasses, is growing vigorously and urgently requires suitable interaction patterns for egocentric vision. One feasible option is taking user’s hand as the medium for human-computer interaction. The wearable computer interprets hand position, posture, and gesture into commands and produces appropriate responses to the user. These properties of hand are preceded by reliable hand detection and segmentation from the egocentric video. The egocentric view brings opportunities for hand detection and segmentation. Since the video is recorded from a first-person perspective, the occlusions are less likely to happen at the attention hand and the user prefers to concentrate on region in the center of view field.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.