Abstract

We consider a generic Markov decision process (MDP) with two controls: one control taking effect immediately and the other control whose effect is delayed by a positive lead time. Computing the optimal policy of this MDP is difficult when the lead time is large. Interestingly, as the lead time grows, one would naturally expect that the effect of the delayed action only weakly depends on the current state, and intuitively decoupling the delayed action from the current state could provide good controls. The purpose of this paper is to substantiate this decoupling intuition for some MDPs by establishing asymptotic optimality of the semi-open-loop policies, which specify open-loop controls for the delayed action and closed-loop controls for the immediate action. Specifically, we show that for an MDP with a fast mixing property and uniformly bounded cost functions, certain periodic semi-open-loop policies are asymptotically optimal. For a classical lost-sales inventory model with divisible products, we provide an elementary proof of asymptotic optimality of constant-order policies. For the same model with indivisible products and integral order quantities, we prove that a special integral open-loop policy, referred to as bracket policy, is asymptotically optimal. Our approach relies on a natural lower bound, provided by the optimal semi-open-loop policies for a finite-horizon problem with a horizon length equal to the original model lead time. We show that as the horizon length becomes large, the long-run average cost incurred by some specific semi-open-loop policies becomes close to the lower bound.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call