Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each pair, describing for each state the relative number of uses of each action. These frequencies, which are defined pathwise, are shown to determine the state-action that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies. Conditions under which the search for optimality in several optimization problems may be restricted to stationary policies are given. These problems include the standard Markov decision process, as well as constrained optimization (both in terms of average cost functionals) and variability-sensitive optimization. An application to a queueing problem is given, where these results imply the existence and explicit computation of optimal policies in constrained optimization problems. The pathwise definition of the conditional frequencies implies that their values can be controlled directly; moreover, they depend only on the limiting behavior of the control. This has immediate application to adaptive control of Markov chains, including adaptive control under constraints.
Read full abstract