Because of their unique adaptability, flexibility, and robustness, musculoskeletal robotic systems are regarded potentially as next-generation robots. However, motion learning and generation of such a robotic system are still challenging. This paper presents a neuromuscular control method, namely, TMS-PPO, based on time-varying muscle synergy (TMS) and proximal policy optimization (PPO). The electromyogram (EMG) activation signals of actual human motions are decomposed to obtain TMSs based on the temporal properties of the TMS. The weights of networks are trained to generate the scale and phase coefficients through the PPO. The coefficients modulate the TMSs to generate appropriate activation patterns to optimize motion learning of the musculoskeletal system. To verify the effectiveness of the proposed method, the TMSs are extracted from human upper limb muscle activation signals, and we compare TMS-PPO with PPO in the motion learning and generation process of an upper limb musculoskeletal system. The results show that TMS-PPO can complete the control tasks because the average errors of the joints are less than 0.05 rad. In the meantime, TMSs are used as motion primitives of the musculoskeletal system to simulate the process of the human CNS controlling muscles. It shows that TMS-PPO reduces the energy consumption and improves the learning rate significantly compared with the PPO. The learning episodes reduce from <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">\(10^4\)</tex-math> </inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">\(10^3\)</tex-math> </inline-formula> , which indicates that TMS-PPO has a stronger learning ability and better physiological explanation. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —Due to the superiorities of the musculoskeletal system, humanoid robots that imitate human driven mechanisms are vigorously carried out worldwide. Taking advantages of human-like characteristics, the musculoskeletal robot provides new opportunities to understand and validate the human mechanisms of muscle control and motion learning, to compare the performance of the robot to that of humans as well as work in real world, e.g., human interactive robots, amusement robots and medical training robots in the future. However, strong redundancy, coupling, and nonlinearity of the system also raises many challenges for the investigation of the control problem. Inspired by how the human CNS controls a musculoskeletal system and realize motion generalization, a novel muscle-synergies-based neuromuscular control that combines time-varying muscle synergy (TMS) and Proximal Policy Optimization (PPO), namely, TMS-PPO is proposed in this paper. The learning efficiency of PPO and the physiological interpretation of the control process are improved during the motion learning and generation processes of the musculoskeletal system. Preliminary simulation experiments suggest that this method is feasible in terms of control accuracy and efficiency. Moreover, the performance of the TMS-PPO is comparable to the PPO without significant improvement. To solve this problem, in future work, we will introduce the cerebellar model into the control method which plays the role of adjusting and correcting the motions of the limbs to achieve accurate and stable control in the actions process of humans.
Read full abstract