Abstract

In this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution pi on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain pi converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

Highlights

  • Reinforcement learning (RL) is one of the three basic machine learning paradigms, together with supervised learning and unsupervised learning

  • We proved some convergence properties for the optimal policies of LQ optimal control problems with uncertainties, assuming that the current belief on the dynamics is represented by a generic probability distribution π on the space of matrices

  • Under standard hypotheses on the system dynamics and the cost functional, we proved that the open-loop, optimal control uπ of Problem B converges to the open-loop, optimal control uof the actual system as soon as the distribution π is sufficiently close (w.r.t. the Wasserstein distance (9)) to a Dirac’s delta δAevaluated at the actual system matrix A

Read more

Summary

Introduction

Reinforcement learning (RL) is one of the three basic machine learning paradigms, together with supervised learning and unsupervised learning. The discrete-time problem setting provides an excellent framework to develop methods and algorithms, which, often underlies a continuous-time structure For this reason, in particular in the control system engineering field, significant attention has been recently given to continuoustime RL [8,14,15,17]. In particular in the control system engineering field, significant attention has been recently given to continuoustime RL [8,14,15,17] Both in discrete- and continuous-time problem settings, one can consider two main RL philosophies: The first one, called model-based, usually concerns the reconstruction of a model from the data trying to mimic the unknown environment. We present the conclusion and discuss some future directions and open questions

Preliminaries and notations
Problem statements and preliminary results
Preliminary results for Problem B
Optimality conditions
Main convergence results
Cxu and
A case of study: finite support measures converging to ıA
A numerical example
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call