Unbounded Rewards Research Articles

In this paper we consider several variants of the standard successive-approximation method for (semi) Markov decision processes with unbounded rewards. Wessels and van Nunen have shown that a class of variants can be generated by randomized stopping times, where the probability of "stopping" the process at timen is independent of the actions taken up to timen. In this paper we allow the stopping time to depend on the actions as well as the states. This makes it possible to extend the class of solution techniques in such a way that properties of the reward and transition structure depending on the actions can be exploited in the development of appropriate successive-approximation methods. For a special actions-dependent stopping time the corresponding algorithm possesses the so-called "equal-row-sum" property, which can be used, for example, to transform semi-Markov decision processes into ordinary Markov decision processes. Moreover, the equal-row-sum transformation allows for good extrapolation to upper and lower bounds and elimination of non-optimal actions. In dieser Arbeit betrachten wir mehrere Varianten der Methode der sukzessiven Approximation bei (semi-) Markoffschen Enscheidungsprozessen mit unbeschrankten Ertragen. Wessels und van Nunen haben gezeigt, daβ man eine Klasse von Verfahren mit Hilfe von randomisierten Stoppzeiten erzeugen kann, wobei die Wahrscheinlichkeit, den Prozeβ zur Zeitn zu stoppen, unabhangig von den Aktionen bis zur Zeitn ist. In der vorliegenden Arbeit durfen die Stoppzeiten von den Aktionen und den Zustanden abhangen. Dadurch ist es moglich, die Klasse der Losungsverfahren so zu erweitern, daβ Eigenschaften der Ertrage und der Ubergangsstruktur, die von den Aktionen abhangen, bei der Entwicklung von Verfahren der sukzessiven Approximation berucksichtigt werden konnen. Fur eine spezielle aktions-abhangige Stoppzeit besitzt der zugehorige Algorithmus die sogenannte "equal-row-sum" Eigenschaft, die beispielsweise Anwendung findet bei der Transformation eines semi-markoffschen Entscheidungsprozesses in einen gewohnlichen Markoffschen Entscheidungsprozeβ. Daruberhinaus gestattet die equal-row-sum Eigenschaft die Konstruktion guter unterer und oberer Schranken der Wertfunktion, sowie die Elimination nicht optimaler Aktionen.

The aim of this paper is to give a survey of recent developments in the area of successive approximations for Markov decision processes and Markov games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (Macqueen bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal ac...

Unbounded Rewards Research Articles

Articles published on Unbounded Rewards

Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion

Finite-state approximations for denumerable state discounted markov decision processes

Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards

Finite state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards

The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter

Action-dependent stopping times and Markov decision process with unbounded rewards

Optimal stopping in terminating one-sided processes with unbounded reward functions

On Markovian decision processes with unbounded rewards

Semi-Regenerative Processes with Unbounded Rewards

Successive approximations for Markov decision processes and Markov games with unbounded rewards

Note—A Note on Dynamic Programming with Unbounded Rewards

On Howard's policy improvement method

On Dynamic Programming with Unbounded Rewards

Semi-Markov Decision Processes with Unbounded Rewards

Optimal stopping of constrained Brownian motion

Optimal stopping of constrained Brownian motion

Discrete Dynamic Programming with Unbounded Rewards

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unbounded Rewards Research Articles

Articles published on Unbounded Rewards

Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion

Finite-state approximations for denumerable state discounted markov decision processes

Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards

Finite state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards

The average-optimal adaptive control of a Markov renewal model in presence of an unknown parameter

Action-dependent stopping times and Markov decision process with unbounded rewards

Optimal stopping in terminating one-sided processes with unbounded reward functions

On Markovian decision processes with unbounded rewards

Semi-Regenerative Processes with Unbounded Rewards

Successive approximations for Markov decision processes and Markov games with unbounded rewards

Note—A Note on Dynamic Programming with Unbounded Rewards

On Howard's policy improvement method

On Dynamic Programming with Unbounded Rewards

Semi-Markov Decision Processes with Unbounded Rewards

Optimal stopping of constrained Brownian motion

Optimal stopping of constrained Brownian motion

Discrete Dynamic Programming with Unbounded Rewards