Abstract

Advances in machine learning technologies in recent years have facilitated developments in autonomous robotic systems. Designing these autonomous systems typically requires manually specified models of the robotic system and world when using classical control-based strategies, or time consuming and computationally expensive data-driven training when using learning-based strategies. Combination of classical control and learning-based strategies may mitigate both requirements. However, the performance of the combined control system is not obvious given that there are two separate controllers. This paper focuses on one such combination, which uses gravity-compensation together with reinforcement learning (RL). We present a study of the effects of gravity compensation on the performance of two reinforcement learning algorithms when solving reaching tasks using a simulated seven-degree-of-freedom robotic arm. The results of our study demonstrate that gravity compensation coupled with RL can reduce the training required in reaching tasks involving elevated target locations, but not all target locations.

Highlights

  • Autonomous robotic systems are widely recognized as a worthwhile technological goal for humanity to achieve

  • The specific number of time steps chosen for each run of Actor Critic Using Kronecker-Factored Trust Region (ACKTR) and PPO2 corresponded to the smallest number of time steps necessary to accumulate near the maximum reward using the best hyperparameter selection found in our initial experiments

  • We introduced an approach to combine reinforcement learning (RL) and control theory by adding terms used in a classical controller to the RL output

Read more

Summary

Introduction

Autonomous robotic systems are widely recognized as a worthwhile technological goal for humanity to achieve. Control theory provides a methodology for creating controllers for dynamical systems in order to accomplish a specified task [14]. One way to use the dynamics model in (1) is to use estimates for M, C, h, and G (denoted M , C, h, and G , respectively) to compensate for the system dynamics and create an auxillary control law for the system. The form of this control is shown in (2), where τcont is the auxillary torque. The choice of τcont is arbitrary and may be selected using a feedback control to stabilize the system or to drive the states to a desired value

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call