Abstract

Full-body motion capture typically requires sensors/markers to be placed on each rigid body segment, which results in long setup times and is obtrusive. The number of sensors/markers can be reduced using deep learning or offline methods. However, this requires large training datasets and/or sufficient computational resources. Therefore, we investigate the following research question: “What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?”. We propose to incorporate past/future inertial sensor information into a stacked input vector, which is fed to a shallow neural network for estimating full-body poses. Shallow and deep learning approaches are compared using the same input vector configurations. Additionally, the inclusion of acceleration input is evaluated. The results show that a shallow learning approach can estimate full-body poses with a similar accuracy (~6 cm) to that of a deep learning approach (~7 cm). However, the jerk errors are smaller using the deep learning approach, which can be the effect of explicit recurrent modelling. Furthermore, it is shown that the delay using a shallow learning approach (72 ms) is smaller than that of a deep learning approach (117 ms).

Highlights

  • Capturing full-body human motion can be valuable for various applications, such as biomechanical analysis, virtual/augmented reality, and gaming

  • Our hypothesis is that similar results can be achieved by using a shallow learning approach. This resulted in the following research question: “What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?” For the shallow learning approach, we propose a stacked input neural network (SINN) approach that requires smaller datasets and less computing power, which can result in suitability for real-time applications

  • We developed a novel way of considering time dependencies in a shallow artificial neural network (ANN), namely, by moving complexity out of the network into a stacked input vector, which contains past and future information

Read more

Summary

Introduction

Capturing full-body human motion can be valuable for various applications, such as biomechanical analysis, virtual/augmented reality, and gaming. Motion capture has the potential to estimate kinetic quantities for various activities [3,4,5]. Virtual/augmented reality can produce realistic training environments for patients by providing interaction with the virtual elements using motion capture (e.g., knee osteoarthritis [6] or phantom limb pain [7]). The success of Microsoft Kinect shows that motion capture can be applied to (serious) gaming (e.g., for traumatic brain injury patients [8] and neurological rehabilitation [9]). Full-body motion capture is currently done by using either body-worn sensors (e.g., inertial measurement units (IMUs) [10]) or external

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.