On Supervised Online Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

Hyeong Soo Chang

doi:10.1109/tac.2023.3274791

Hyeong Soo Chang

PDF Available

https://doi.org/10.1109/tac.2023.3274791

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

This note re-visits the rolling-horizon control approach to the problem of Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approaches, we develop an asynchronous on-line algorithm based on policy iteration integrated with a multi-policy improvement method of policy switching. A sequence of monotonically improving solutions to the forecast-horizon sub-MDP is generated by updating the current solution only at the currently visited state, building in effect a rolling-horizon control policy for the MDP over infinite horizon. Feedbacks from “supervisors,” if available, can be also incorporated while updating. We focus on the convergence issue with a relation to the transition structure of the MDP. Either a global convergence to an optimal forecast-horizon policy or a local convergence to a “locally-optimal” fixed-policy in a finite time is achieved by the algorithm depending on the structure.

Full Text