Robustness to Approximations and Model Learning in MDPs and POMDPs

Ali Devran Kara,Serdar Yüksel

doi:10.1007/978-3-030-76928-4_9

Ali Devran Kara, Serdar Yüksel

https://doi.org/10.1007/978-3-030-76928-4_9

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2021
Citations: 2	License type: public-domain

Affiliation: Queen's University

Abstract
Full-Text
Similar Papers

Abstract

Listen

In stochastic control applications, typically only an ideal model (controlled transition kernel) is assumed and the control design is based on the given model, raising the problem of performance loss due to the mismatch between the assumed model and the actual model. In some further setups, an exact model may be known, but this model may entail computationally challenging optimality analysis leading to the solution of some approximate model being implemented. With such a motivation, we study continuity properties of discrete-time stochastic control problems with respect to system models and robustness of optimal control policies designed for incorrect models applied to the true system. We study both fully observed and partially observed setups under an infinite horizon discounted expected cost criterion. We show that continuity can be established under total variation convergence of the transition kernels under mild assumptions and with further restrictions on the dynamics and observation model under weak and setwise convergence of the transition kernels. Using these, we establish convergence results and error bounds due to mismatch that occurs by the application of a control policy which is designed for an incorrectly estimated system model to the actual system, thus establishing results on robustness. These entail implications on empirical learning in (data-driven) stochastic control since often system models are learned through empirical training data where typically the weak convergence criterion applies but stronger convergence criteria do not. We finally view and establish approximation as a particular instance of robustness.

Full Text