Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement

Hyeong Soo Chang,Edwin K P Chong

doi:10.1109/tac.2007.892381

Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement

Hyeong Soo Chang, Edwin K P Chong

https://doi.org/10.1109/tac.2007.892381

Copy DOI

Journal: IRE Transactions on Automatic Control	Publication Date: Mar 1, 2007
Citations: 14

Affiliation: Sogang University, Colorado State University

#Markov Set-chains #Markov Decision Processes + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider Markov decision processes (MDPs) where the state transition probability distributions are not uniquely known, but are known to belong to some intervals-so called "controlled Markov set-chains"-with infinite-horizon discounted reward criteria. We present formal methods to improve multiple policies for solving such controlled Markov set-chains. Our multipolicy improvement methods follow the spirit of parallel rollout and policy switching for solving MDPs. In particular, these methods are useful for online control of Markov set-chains and for designing policy iteration (PI) type algorithms. We develop a PI-type algorithm and prove that it converges to an optimal policy

Full Text