Abstract

This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23] has proposed a modified policy iteration algorithm with a suboptimality test or MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of iterations of successive approximations and includes the method of successive approximations and policy iteration method as special cases. This paper devises a modified policy iteration algorithm with the suboptimality test of Hastings and Mello type and proves that it constructs a finite sequence of policies whose last element is either a unique optimal policy or an ⋴-optimal policy. Moreover, a new notion of equivalent decision processes is introduced, and many iterative methods for solving a system of linear equations such as the Jacobi method, simultaneous overrelaxation method, Gauss-Seidel method, successive overrelaxation method, stationary Richardson's method and so on are shown to convert the original semi-Markov decision process to equivalent decision processes. Various transformed algorithms are derived from the modified policy iteration algorithm with the suboptimality test applied to those equivalent decision processes. Numerical comparisons are made for Howard's automobile replacement problem. They show that the modified policy iteration algorithm with the suboptimality test is much more efficient than van Nunen's algorithm and is superior to the policy iteration method, linear programming and some transformed algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call