This paper attempts to study the risk-sensitive discounted discrete-time Markov decision processes in Borel spaces, in which the reward functions are allowed to be unbounded from above and from below. We find mild conditions imposed on the primitive data of the decision processes, which not only ensure the existence of a solution to the optimality equation (OE in short), but also are the generalization of the bounded reward case. Furthermore, using the OE and a novel technique, we prove the existence of an optimal policy out of the class of randomized history-dependent policies. Finally, we illustrate our results with an inventory system.
Read full abstract