Abstract

In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.

Highlights

  • The decomposition method consists in dividing the space of states into subsets which are weakly coupled

  • We model in this work the environment as a Constrained Markov Decision Processes, defined by a tuple where S is the set of states, A is the set of actions, is the transition probability, is the reward function which denotes immediate reward incurred by taking action in state, is the cost function upper bounded by, of cost constraint, is the discount factor and is the initial fixed state

  • We will solve the problem of the constrained discounted Markov Decision Processes exploiting the decomposition of the state space into the strongly communicating classes by steps

Read more

Summary

Introduction

The decomposition method consists in dividing the space of states into subsets which are weakly coupled This technique was first introduced by Bather [1]. Following Ross and Varadarajan [5] have presented a similar decomposition method to solve the constrained problem of the long-time average Markov Decision Processes. In this decomposition, the state space is partitioned into Strongly Communicating Classes and a set (perhaps empty) of transient states. We will solve the problem of the constrained discounted Markov Decision Processes exploiting the decomposition of the state space into the strongly communicating classes by steps.

Preliminaries
Decomposition theory
Whilefor some DO
Restricted MDPs
Intermediate MDP
An optimal policy for the original MDP
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call