Abstract

This work concerns discrete-time Markov decision processes with finite state space and bounded costs per stage. The decision maker ranks random costs via the expectation of the utility function associated to a constant risk sensitivity coefficient, and the performance of a control policy is measured by the corresponding (long-run) risk-sensitive average cost criterion. The main structural restriction on the system is the following communication assumption: For every pair of states x and y, there exists a policy π, possibly depending on x and y, such that when the system evolves under π starting at x, the probability of reaching y is positive. Within this framework, the paper establishes the existence of solutions to the optimality equation whenever the constant risk sensitivity coefficient does not exceed certain positive value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call