Abstract
Among the infinite number of possible movements that can be produced, humans are commonly assumed to choose those that optimize criteria such as minimizing movement time, subject to certain movement constraints like signal-dependent and constant motor noise. While so far these assumptions have only been evaluated for simplified point-mass or planar models, we address the question of whether they can predict reaching movements in a full skeletal model of the human upper extremity. We learn a control policy using a motor babbling approach as implemented in reinforcement learning, using aimed movements of the tip of the right index finger towards randomly placed 3D targets of varying size. We use a state-of-the-art biomechanical model, which includes seven actuated degrees of freedom. To deal with the curse of dimensionality, we use a simplified second-order muscle model, acting at each degree of freedom instead of individual muscles. The results confirm that the assumptions of signal-dependent and constant motor noise, together with the objective of movement time minimization, are sufficient for a state-of-the-art skeletal model of the human upper extremity to reproduce complex phenomena of human movement, in particular Fitts’ Law and the frac{2}{3} Power Law. This result supports the notion that control of the complex human biomechanical system can plausibly be determined by a set of simple assumptions and can easily be learned.
Highlights
OPEN Reinforcement learning control of a biomechanical model of the upper extremity Florian Fischer*, Miroslav Bachinski, Markus Klar, Arthur Fleig & Jörg Müller
The task follows the ISO 9241-9 ergonomics standard and incorporates 13 equidistant targets arranged in a circle at 50 cm distance in front of the body and placed 10 cm to the right of the right shoulder (Fig. 2)
The objective is for the end-effector to reach each target and to remain inside the target for 100 ms
Summary
OPEN Reinforcement learning control of a biomechanical model of the upper extremity Florian Fischer*, Miroslav Bachinski, Markus Klar, Arthur Fleig & Jörg Müller. While the thorax is fixed in space, the right upper extremity can move freely by actuating these DOFs. To deal with the curse of dimensionality and make the control problem tractable, following van Beers et al.[4], we use a simplified secondorder muscle model acting at each DOF instead of individual muscles. To deal with the curse of dimensionality and make the control problem tractable, following van Beers et al.[4], we use a simplified secondorder muscle model acting at each DOF instead of individual muscles These second-order dynamics map an action vector obtained from the learned policy to the resulting activations for each DOF. The optimal value of a certain state is estimated from sampling different actions in the environment and observing the subsequent state and obtained r eward[5]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have