Abstract

Hand segmentation is a crucial task in first-person vision. Since first-person images exhibit strong bias in appearance among different environments, adapting a pre-trained segmentation model to a new domain is required in hand segmentation. Here, we focus on appearance gaps for hand regions and backgrounds separately. We propose (i) foreground-aware image stylization and (ii) consensus pseudo-labeling for domain adaptation of hand segmentation. We stylize source images independently for the foreground and background using target images as style. To resolve the domain shift that the stylization has not addressed, we apply careful pseudo-labeling by taking a consensus between the models trained on the source and stylized source images. We validated our method on domain adaptation of hand segmentation from real and simulation images. Our method achieved state-of-the-art performance in both settings. We also demonstrated promising results in challenging multi-target domain adaptation and domain generalization settings.

Highlights

  • Mobile cameras have become popular thanks to advances in photography, and a massive number of videos are recorded nowadays

  • To handle the domain shift in first-person vision, we propose a two-stage semi-supervised domain adaptation method for hand segmentation with (i) foreground-aware stylization and (ii) consensus pseudo-labeling (Fig. 1)

  • We address domain adaptation tasks in hand segmentation where a real dataset [27] and two synthetic datasets [32] are used for the source dataset, and four first-person vision datasets [7], [8], [14], [31] are prepared for the target dataset

Read more

Summary

Introduction

Mobile cameras have become popular thanks to advances in photography, and a massive number of videos are recorded nowadays. First-person vision [23], which captures interactions from the user’s point of view by utilizing body-worn cameras, is gaining interest. Analyzing firstperson videos offers the opportunity for assisting in people’s daily life and is useful in various applications, such as assistive technology [26], augmented reality [61], visual lifelogging [6], and human-robot interaction [57]. When analyzing first-person videos, hands play a fundamental role in understanding the wearer’s action and intention. Segmenting the hand region is crucial for several downstream tasks [4] such as hand pose estimation, 3D hand shape reconstruction, and hand-object interaction recognition. A segmentation model must correctly segment the hand regions of diverse users and environments

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.