Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Tapas K Das,Sridhar Mahadevan,Abhijit Gosavi,Nicholas Marchalleck

doi:10.1287/mnsc.45.4.560

Abstract

A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, and obtaining these is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic channel allocation of cellular telephone systems. In this paper, we extend RL to a more general class of decision tasks that are referred to as semi-Markov decision problems (SMDPs). In particular, we focus on SMDPs under the average-reward criterion. We present a new model-free RL algorithm called SMART (Semi-Markov Average Reward Technique). We present a detailed study of this algorithm on a combinatorially large problem of determining the optimal preventive maintenance schedule of a production inventory system. Numerical results from both the theoretical model and the RL algorithm are presented and compared.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Management Science

Lead the way for us

Journal: Management Science	Publication Date: Apr 1, 1999
Citations: 217

Similar Papers

Solving sequential decision-making problems under virtual reality simulation system
Yang Xianglong ... Feng Yuncheng
-
Yang Xianglong, et. al. Yang Xianglong ... Feng Yuncheng
09 Dec 2001
09 Dec 2001

A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis
Abhijit Gosavi
Machine Learning | VOL. 55
Abhijit GosaviAbhijit Gosavi
01 Apr 2004
Machine Learning | VOL. 55

Contraction Mappings in the Theory Underlying Dynamic Programming
Eric V Denardo
SIAM Review | VOL. 9
Eric V DenardoEric V Denardo
01 Apr 1967
SIAM Review | VOL. 9

Average Reward Reinforcement Learning for Semi-Markov Decision Processes
Jiayuan Yang ... Yanjie Li
-
Jiayuan Yang, et. al.Jiayuan Yang ... Yanjie Li
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Management Science