Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Martin Brablc,Robert Grepl,Jan Žegklitz,Robert Babuška

doi:10.1155/2021/6617309

Abstract

Reinforcement learning (RL) agents can learn to control a nonlinear system without using a model of the system. However, having a model brings benefits, mainly in terms of a reduced number of unsuccessful trials before achieving acceptable control performance. Several modelling approaches have been used in the RL domain, such as neural networks, local linear regression, or Gaussian processes. In this article, we focus on techniques that have not been used much so far: symbolic regression (SR), based on genetic programming and local modelling. Using measured data, symbolic regression yields a nonlinear, continuous-time analytic model. We benchmark two state-of-the-art methods, SNGP (single-node genetic programming) and MGGP (multigene genetic programming), against a standard incremental local regression method called RFWR (receptive field weighted regression). We have introduced modifications to the RFWR algorithm to better suit the low-dimensional continuous-time systems we are mostly dealing with. The benchmark is a nonlinear, dynamic magnetic manipulation system. The results show that using the RL framework and a suitable approximation method, it is possible to design a stable controller of such a complex system without the necessity of any haphazard learning. While all of the approximation methods were successful, MGGP achieved the best results at the cost of higher computational complexity. Index Terms–AI-based methods, local linear regression, nonlinear systems, magnetic manipulation, model learning for control, optimal control, reinforcement learning, symbolic regression.

Highlights

A reinforcement learning (RL) agent interacts with the system to be controlled by measuring its states and applying actions according to a policy so that a given goal state is attained. e policy is iteratively adapted in such a way that the agent receives the highest possible cumulative reward, which is a scalar value accumulated over trajectories in the system’s state space. e reward associated with each transition in the state space is described by a predefined value function.Existing RL algorithms can be divided into critic-only, actor-only, and actor-critic variants. e critic-only variants optimize the value function (V-function) that is used to derive the policy; the actor-only variants work directly on the policy optimization without any need for a value function; and actor-critic variants optimize both functions simultaneously
We compared the performance of various models of a complex nonlinear system created with three different approximation methods, two of which were based on genetic programming and the third was based on a modified local linear approximation algorithm (RFWR)
Most of the models selected for the actual control experiments were successful in achieving stable control, they differ in precision. e histogram in Figure 10 shows the number of models for the two GP-based algorithms (SNGP and multigene genetic programming (MGGP)), which fall into several MSE categories. e MSE describes the control precision as the mean-squared error between the goal and the actual trajectory of the closed-loop controlled system

Summary

Introduction

A reinforcement learning (RL) agent interacts with the system to be controlled by measuring its states and applying actions according to a policy so that a given goal state is attained. e policy is iteratively adapted in such a way that the agent receives the highest possible cumulative reward, which is a scalar value accumulated over trajectories in the system’s state space. e reward associated with each transition in the state space is described by a predefined value function.Existing RL algorithms can be divided into critic-only, actor-only, and actor-critic variants. e critic-only variants optimize the value function (V-function) that is used to derive the policy; the actor-only variants work directly on the policy optimization without any need for a value function; and actor-critic variants optimize both functions simultaneously. E policy is iteratively adapted in such a way that the agent receives the highest possible cumulative reward, which is a scalar value accumulated over trajectories in the system’s state space. From a different point of view, RL algorithms can be divided into model-based and model-free variants. Examples of both approaches can be found in [3, 4]. Model-free methods learn online exclusively through trial and error. Both variants have their specific advantages and disadvantages. We employ the model-based, critic-only variant without any online training so that we can compare different modelling approaches

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complexity	Publication Date: Dec 20, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity

Lead the way for us

Similar Papers

Constructing parsimonious analytic models for dynamic systems via symbolic regression
Erik Derner ... Robert Babuška
Applied Soft Computing | VOL. 94
Erik Derner, et. al.Erik Derner ... Robert Babuška
05 Jun 2020
Applied Soft Computing | VOL. 94

Genetic programming methods for reinforcement learning
Robert Babuska
-
Robert BabuskaRobert Babuska
13 Jul 2019
13 Jul 2019

Local regressions for decomposing CO2 and CH4 time-series in a semi-arid ecosystem
Beatriz Fernández-Duque ... M Luisa Sánchez
Atmospheric Pollution Research | VOL. 11
Beatriz Fernández-Duque, et. al.Beatriz Fernández-Duque ... M Luisa Sánchez
21 Oct 2019
Atmospheric Pollution Research | VOL. 11

Test Data Generation Efficiency Prediction Model for EFSM Based on MGGP
Weiwei Wang ... Ying Shang
-
Weiwei Wang, et. al.Weiwei Wang ... Ying Shang
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity