Online Model Learning Algorithms for Actor-Critic Control

I Grondman

doi:10.4233/uuid:415e14fd-0b1b-4e18-8974-5ad61f7fe280

Abstract

Classical control theory requires a model to be derived for a system, before any control design can take place. This can be a hard, time-consuming process if the system is complex. Moreover, there is no way of escaping modelling errors. As an alternative approach, there is the possibility of having the system learn a controller by itself while it is in operation or offline. Reinforcement learning (RL) is such a framework in which an agent (or controller) optimises its behaviour by interacting with its environment. For continuous state and action spaces, the use of function approximators is a necessity and a commonly used type of RL algorithms for these continuous spaces is the actor-critic algorithm, in which two independent function approximators take the role of the policy (the actor) and the value function (the critic). A main challenge in RL is to use the information gathered during the interaction as efficiently as possible, such that an optimal policy may be reached in a short amount of time. The majority of RL algorithms at each time step measure the state, choose an action corresponding to this state, measure the next state, the corresponding reward and update a value function (and possibly a separate policy). As such, the only source of information used for learning at each time step is the last transition sample. This thesis proposes novel actor-critic methods that aim to shorten the learning time by using every transition sample collected during learning to learn a model of the system online. It also explores the possibility of speeding up learning by providing the agent with explicit knowledge of the reward function.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Model Learning Algorithms for Actor-Critic Control

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
Xin Xu ... Chunming Liu
Soft Computing | VOL. 15
Xin Xu, et. al.Xin Xu ... Chunming Liu
28 Mar 2010
Soft Computing | VOL. 15

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
Juan C Santamaria ... Ashwin Ram
Adaptive Behavior | VOL. 6
Juan C Santamaria, et. al.Juan C Santamaria ... Ashwin Ram
01 Sep 1997
Adaptive Behavior | VOL. 6

RELATIONSHIP BETWEEN CURRICULUM CONTENT MANAGEMENT AND TRANSITION OF PUBLIC PRIMARY SCHOOL LEARNERS WITH DISABILITIES TO SECONDARY SCHOOL IN KENYA
...
-
, et. al. ...
22 Jun 2018
22 Jun 2018

Sigma point policy iteration
...
-
, et. al. ...
12 May 2008
12 May 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Model Learning Algorithms for Actor-Critic Control

Abstract

Talk to us

Similar Papers