Abstract

In this chapter, we study the optimization of the long-run average of multi-class time-nonhomogeneous Markov chains (TNHMCs). We show that with confluencity, state classification, and relative optimization, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). The optimality conditions do not need to hold in any finite period, or “non-frequently visited” time sequence. In the analysis, we assume that the limit of the average exists. In general, the performance should be defined as the “liminf” of the average. However, because of the non-linear property of “liminf”, it is not well-defined for branching states, unless the TNHMC is “asynchronous” among different confluent classes. This property is also studied.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call