Abstract

To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear–quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of primal–dual optimization. We first reformulate the stochastic LQR problem as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal–dual (MB-PD) algorithm based on the properties of the resulting Karush–Kuhn–Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we establish the connection between the proposed MB-PD algorithm and classical policy iteration algorithm, which provides a novel primal–dual optimization perspective to understand the common RL algorithms. Finally, we provide a high-dimensional case study to show the performance of the proposed algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call