Abstract

This paper considers Markov decision processes (MDPs) with risk-sensitivity. The aim is to explore the effects of state transience and non-communication on the optimal control of the system. A vanishing discount approach is investigated. The approximating system is the MDP evaluated by the usual discounted exponential utility. After being appropriately normalized, it is shown that the optimal discounted value functions converge to the optimal risk-sensitive averages as the discount factor goes to 1. These value functions are shown to depend on certain order structure of the state space. In this way it is also proved that the optimal policies for the discounted system converge to the optimal ones for the risk-sensitive system as the discount factor goes to 1. In proving these, an ordered partition of the state space is introduced, which is closely related to the characteristics of state communication, transience and absorption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call