Abstract

This paper investigates the combination of model predictive control (MPC) concepts and posterior sampling techniques and proposes a simple constraint tightening technique to introduce cautiousness during explorative learning episodes. The provided theoretical analysis in terms of cumulative regret focuses on previously stated sufficient conditions of the resulting ‘Cautious Bayesian MPC’ algorithm and shows Lipschitz continuity of the future reward function in the case of linear MPC problems. In the case of nonlinear MPC problems, it is shown that commonly required assumptions for nonlinear MPC optimization techniques provide sufficient criteria for model-based RL using posterior sampling. Furthermore, it is shown that the proposed constraint tightening implies a bound on the expected number of unsafe learning episodes in the linear and nonlinear case using a soft-constrained MPC formulation. The efficiency of the method is illustrated using numerical examples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call