Abstract

This paper discusses the problem of one-shot gesture recognition using a human-centered approach and its potential application to fields such as human-robot interaction where the user’s intentions are indicated through spontaneous gesturing (one-shot). Casual users have limited time to learn the gestures interface, which makes one-shot recognition an attractive alternative to interface customization. In the aim of natural interaction with machines, a framework must be developed to include the ability of humans to understand gestures from a single observation. Previous approaches to one-shot gesture recognition have relied heavily on statistical and data-mining-based solutions, and have ignored the mechanisms that are used by humans to perceive and execute gestures and that can provide valuable context information. This omission has led to suboptimal solutions. The focus of this work is on the process that leads to the realization of a gesture, rather than on the gesture itself. In this case, context involves the way in which humans produce gestures—the kinematic and anthropometric characteristics. In the method presented here, the strategy is to generate a data set of realistic samples based on features extracted from a single gesture sample. These features, called the “gist of a gesture,” are considered to represent what humans remember when seeing a gesture and, later, the cognitive process involved when trying to replicate it. By adding meaningful variability to these features, a large training data set is created while preserving the fundamental structure of the original gesture. The availability of a large data set of realistic samples allows the use of training classifiers for future recognition. The performance of the method is evaluated using different lexicons, and its efficiency is compared with that of traditional N-shot learning approaches. The strength of the approach is further illustrated through human and machine recognition of gestures performed by a dual arm-robotic platform.

Highlights

  • Gestures are a key component of human–human interactions (Kendon, 1986)

  • Once the lexicon had been selected, the gist of the gesture extracted, and the data set expanded with artificial observations, four different classifiers were trained with these data sets to achieve one-shot gesture recognition

  • The performance of these classifiers is demonstrated in the following subsections in terms of accuracy and efficiency when compared with traditional N-shot learning approaches

Read more

Summary

Introduction

Gestures are a key component of human–human interactions (Kendon, 1986). we expect machines and service robots to be able to understand this form of interaction as intuitively as humans do. Having seen a gesture only once, we are able to recognize it the time it is presented because of our capability to learn from just a few examples and to make associations between concepts (Brown and Kane, 1988; Ormrod and Davis, 2004; Johnson et al, 2005) Modeling this capability is one of the main challenges faced in the development of natural human–robot interaction (HRI). Researchers have been studying how gestures are produced, perceived, and mimicked, as well as how computer systems can detect and recognize them This last area is especially relevant to human–computer interaction (Pavlovic et al, 1997; Rautaray and Agrawal, 2015), HRI (Nickel and Stiefelhagen, 2007; Yang et al, 2007), and assistive technologies (Jacob and Wachs, 2014; Jiang et al, 2016b), where humans rely on accurate recognition by machines

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.