Markov Decision Chains Research Articles

Previous treatments of multiplicative Markov decision chains (eg., Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, New Jersey.], Mandl [Mandl, P. 1967. An iterative method for maximizing the characteristic root of positive matrices. Rev. Roumaine Math. Pures Appl. XII 1317–1322.], and Howard and Matheson [Howard, R. A., Matheson, J. E. 1972. Risk-sensitive Markov decision processes. Management Sci. 8 356–369.]) restricted attention to stationary policies and assumed that all transition matrices are irreducible and aperiodic. They also used a “first term” optimality criterion, namely maximizing the spectral radius of the associated transition matrix. We give a constructive proof of the existence of optimal policies among all policies under new cumulative average optimality criteria which are more sensitive than the maximization of the spectral radius. The algorithm for finding an optimal policy, first searches for a stationary policy with a nonnilpotent transition matrix, provided such a rule exists. Otherwise, the method still finds an optimal policy; though in this case the set of optimal policies usually does not contain a stationary policy! If a stationary policy with a nonnilpotent transition matrix exists, then we develop a policy improvement algorithm which finds a stationary optimal policy.

This paper considers a (multiplicative) process called branching Markov decision chains in which the output at the end of the Nth period equals the product of N nonnegative matrices chosen at the beginning of periods 1, …, N, respectively, times a positive (fixed) terminal reward vector. It is assumed that the above transition matrices are drawn out of a finite set of matrices given in product form (i.e., the rows of the matrices can be selected independently out of finite sets of nonnegative row vectors). For each coordinate s we define the geometric and algebraic growth rates, respectively, of the sth coordinate of the stream of output. These growth rates are defined so that the magnitude of the corresponding sequence is of the order αNNk where α is the geometric growth rate and k is the algebraic growth rate. The main result of this paper is the constructive establishment of the existence of a transition matrix whose repeated use will guarantee, for each coordinate, the achievement of the best geometric growth rate and the best algebraic growth rate subject to the geometric growth rate at maximum, among all potential sequences of transition matrices.

Markov Decision Chains Research Articles

Related Topics

Articles published on Markov Decision Chains

Technical Note—A Pure Birth Model of Optimal Advertising with Word-of-Mouth

Multiplicative Markov Decision Chains

Growth Optimality for Branching Markov Decision Chains

Optimality conditions for a Markov decision chain with unbounded costs

Optimality conditions for a Markov decision chain with unbounded costs

Overtaking Optimality for Markov Decision Chains

Linear Programming and Markov Decision Chains

Stochastic Dynamic Location Analysis

部分観測可能なマルコフ決定過程とアルゴリズム

Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies

Note—A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains

Markov decision chains with unbounded costs and applications to the control of queues

Markov decision chains with unbounded costs and applications to the control of queues

Normalized Markov Decision Chains I; Sensitive Discount Optimality

Bounds and Transformations for Discounted Finite Markov Decision Chains

Note—A Counterexample in Continuous Markov Decision Chains

Preferred Rules in Continuous Time Markov Decision Processes

On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains

Constrained Markov Decision Chains

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Markov Decision Chains Research Articles

Related Topics

Articles published on Markov Decision Chains

Technical Note—A Pure Birth Model of Optimal Advertising with Word-of-Mouth

Multiplicative Markov Decision Chains

Growth Optimality for Branching Markov Decision Chains

Optimality conditions for a Markov decision chain with unbounded costs

Optimality conditions for a Markov decision chain with unbounded costs

Overtaking Optimality for Markov Decision Chains

Linear Programming and Markov Decision Chains

Stochastic Dynamic Location Analysis

部分観測可能なマルコフ決定過程とアルゴリズム

Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies

Note—A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains

Markov decision chains with unbounded costs and applications to the control of queues

Markov decision chains with unbounded costs and applications to the control of queues

Normalized Markov Decision Chains I; Sensitive Discount Optimality

Bounds and Transformations for Discounted Finite Markov Decision Chains

Note—A Counterexample in Continuous Markov Decision Chains

Preferred Rules in Continuous Time Markov Decision Processes

On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains

Constrained Markov Decision Chains