Abstract

The purpose of this paper is to illustrate how value iteration can be used in a zero-sum game to obtain structural results on the optimal (equilibrium) value and policy. This is done through the following example. We consider the problem of dynamic flow control of arriving customers into a finite buffer. The service rate may depend on the state of the system, may change in time and is unknown to the controller. The goal of the controller is to design a policy that guarantees the best performance under the worst case service conditions. The cost is composed of a holding cost, a cost for rejecting customers and a cost that depends on the quality of the service. We consider both discounted and expected average cost. The problem is studied in the framework of zero-sum Markov games where the server, called player 1, is assumed to play against the flow controller, called player 2. Each player is assumed to have the information of all previous actions of both players as well as the current and past states of the system. We show that there exists an optimal policy for both players which is stationary (that does not depend on the time). A value iteration algorithm is used to obtain monotonicity properties of the optimal policies. For the case that only two actions are available to one of the players, we show that his optimal policy is of a threshold type, and optimal policies exist for both players that may need randomization in at most one state.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call