Abstract

Classical control theory requires a model to be derived for a system, before any control design can take place. This can be a hard, time-consuming process if the system is complex. Moreover, there is no way of escaping modelling errors. As an alternative approach, there is the possibility of having the system learn a controller by itself while it is in operation or offline. Reinforcement learning (RL) is such a framework in which an agent (or controller) optimises its behaviour by interacting with its environment. For continuous state and action spaces, the use of function approximators is a necessity and a commonly used type of RL algorithms for these continuous spaces is the actor-critic algorithm, in which two independent function approximators take the role of the policy (the actor) and the value function (the critic). A main challenge in RL is to use the information gathered during the interaction as efficiently as possible, such that an optimal policy may be reached in a short amount of time. The majority of RL algorithms at each time step measure the state, choose an action corresponding to this state, measure the next state, the corresponding reward and update a value function (and possibly a separate policy). As such, the only source of information used for learning at each time step is the last transition sample. This thesis proposes novel actor-critic methods that aim to shorten the learning time by using every transition sample collected during learning to learn a model of the system online. It also explores the possibility of speeding up learning by providing the agent with explicit knowledge of the reward function.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call