On the convergence of optimal actions for Markov decision processes and the optimality of (s, S) inventory policies

Eugene A Feinberg,Mark E Lewis

doi:10.1002/nav.21750

Abstract

AbstractThis article studies convergence properties of optimal values and actions for discounted and average‐cost Markov decision processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic‐review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs with possibly non‐compact action sets and unbounded cost functions: (i) convergence of value iterations to optimal values for discounted problems with possibly non‐zero terminal costs, (ii) convergence of optimal finite‐horizon actions to optimal infinite‐horizon actions for total discounted costs, as the time horizon tends to infinity, and (iii) convergence of optimal discount‐cost actions to optimal average‐cost actions for infinite‐horizon problems, as the discount factor tends to 1. Being applied to the setup‐cost inventory control problem, the general results on MDPs imply the optimality of (s, S) policies and convergence properties of optimal thresholds. In particular this article analyzes the setup‐cost inventory control problem without two assumptions often used in the literature: (a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of backordered inventory if the amount of backordered inventory is large.© 2017 Wiley Periodicals, Inc. Naval Research Logistics 65: 619–637, 2018

Full Text