The $n$th-Order Bias Optimality for Multichain Markov Decision Processes

Xi-Ren Cao Xi-Ren Cao,Junyu Zhang Junyu Zhang

doi:10.1109/tac.2007.915168

Abstract

In this paper, we propose a new approach to the theory of finite multichain Markov decision processes (MDPs) with different performance optimization criteria. We first propose the concept of nth-order bias; then, using the average reward and bias difference formulas derived in this paper, we develop an optimization theory for finite MDPs that covers a complete spectrum from average optimality, bias optimality, to all high-order bias optimality, in a unified way. The approach is simple, direct, natural, and intuitive; it depends neither on Laurent series expansion nor on discounted MDPs. We also propose one-phase policy iteration algorithms for bias and high-order bias optimal policies, which are more efficient than the two-phase algorithms in the literature. Furthermore, we derive high-order bias optimality equations. This research is a part of our effort in developing sensitivity-based learning and optimization theory.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The $n$th-Order Bias Optimality for Multichain Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control

Lead the way for us

Journal: IEEE Transactions on Automatic Control	Publication Date: Mar 1, 2008
Citations: 45

Similar Papers

Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes
Darong Wu ... Yuanqi Zhao
BMC Medical Research Methodology | VOL. 12
Darong Wu, et. al.Darong Wu ... Yuanqi Zhao
09 Mar 2012
BMC Medical Research Methodology | VOL. 12

NDP Methods for Multi-chain MDPs
Hao Tang ... Lei Zhou
-
Hao Tang, et. al.Hao Tang ... Lei Zhou
01 Jan 2006
01 Jan 2006

Generalized Inverses in Discrete Time Markov Decision Processes
Bernard F Lamond ... Martin L Puterman
SIAM Journal on Matrix Analysis and Applications | VOL. 10
Bernard F Lamond, et. al.Bernard F Lamond ... Martin L Puterman
01 Jan 1989
SIAM Journal on Matrix Analysis and Applications | VOL. 10

Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach
Keith W Ross ... Ravi Varadarajan
Mathematics of Operations Research | VOL. 16
Keith W Ross, et. al.Keith W Ross ... Ravi Varadarajan
01 Feb 1991
Mathematics of Operations Research | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The $n$th-Order Bias Optimality for Multichain Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control