Abstract

Recent developments of sensors that allow tracking of human movements and gestures enable rapid progress of applications in domains like medical rehabilitation or robotic control. Especially the inertial measurement unit (IMU) is an excellent device for real-time scenarios as it rapidly delivers data input. Therefore, a computational model must be able to learn gesture sequences in a fast yet robust way. We recently introduced an echo state network (ESN) framework for continuous gesture recognition (Tietz et al., 2019) including novel approaches for gesture spotting, i.e., the automatic detection of the start and end phase of a gesture. Although our results showed good classification performance, we identified significant factors which also negatively impact the performance like subgestures and gesture variability. To address these issues, we include experiments with Long Short-Term Memory (LSTM) networks, which is a state-of-the-art model for sequence processing, to compare the obtained results with our framework and to evaluate their robustness regarding pitfalls in the recognition process. In this study, we analyze the two conceptually different approaches processing continuous, variable-length gesture sequences, which shows interesting results comparing the distinct gesture accomplishments. In addition, our results demonstrate that our ESN framework achieves comparably good performance as the LSTM network but has significantly lower training times. We conclude from the present work that ESNs are viable models for continuous gesture recognition delivering reasonable performance for applications requiring real-time performance as in robotic or rehabilitation tasks. From our discussion of this comparative study, we suggest prospective improvements on both the experimental and network architecture level.

Highlights

  • Continuous gesture recognition is a challenging task due to three critical aspects: (1) the correct identification of the start and end of the actual gesture, called subgesture, (2) the recognition of a gesture of possibly variable length, called inter-subject variability, and (3) the accurate distinction between an active gesture and subtle movements or silent phases like pauses

  • If the total activity is above a predefined threshold of 0.4 we start summing up all individual outputs over time until the total activity falls below the threshold again

  • (WG) – A prediction and class segment that does not overlap is counted as false positive (FP) – An actual class without a mapping is a false negative (FN)

Read more

Summary

Introduction

Continuous gesture recognition is a challenging task due to three critical aspects: (1) the correct identification of the start and end of the actual gesture, called subgesture, (2) the recognition of a gesture of possibly variable length, called inter-subject variability, and (3) the accurate distinction between an active gesture and subtle movements or silent phases like pauses. A special implementation called echo state networks (ESNs), proposed by Jager [14], has been successfully applied to language processing [12, 25], navigation tasks [6] and central pattern generation [28]. The RC community is growing in the recent years due to the successful implementation of reservoirs in hardware [2, 23], supporting real-world applications like human action recognition [1]. Gestures are sequences similar to sentences, human actions, or path trajectories, surprisingly little is known about the potential application of ESNs to the task of gesture recognition [8, 15]

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.