Abstract

A steady-state optimal control problem is considered for nearly completely decomposable Markov chains. In order to apply the policy iteration method of R.A. Howard (Dynamic Programming and Markov Processes, Cambridge, MA, MIT Press, 1960), a high-dimensional ill-conditioned system of algebraic equations must be solved in the value-determination step. Although algorithms exist for aggregation of the steady-state probability distribution problem, they only provide methods for computing the cost, not the dual variables. Using a singular perturbation approach, an aggregation method for the value-determination equation is developed. The aggregation method is developed in three steps. First, a class of similarity transformations that transform the system into a singularly perturbed form is developed. Second, an aggregation method to compute the steady-state probability distribution is derived. Third, this aggregation method is applied to the value determination step of Howard's method (1960). >

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call