Abstract

We study discrete controlled Markov chains with finite state and action spaces. The performance of control policies is measured by a risk-sensitive average cost, the exponential average cost (EAC), which models risk-sensitivity by means of an exponential (dis)utility function. The main result is the characterization of the EAC corresponding to an arbitrary stationary deterministic policy in terms of the spectral radii of suitable irreducible matrices. This result generalizes a well known theorem of Howard and Matheson (1972) that deals with the particular case in which the transition probability matrix induced by the policy is primitive. It is shown that, when a stationary deterministic policy determines only one class of recurrent states, the corresponding EAC converges to the risk-null average cost as the risk-sensitivity coefficient goes to zero. However, it is also shown that for large risk-sensitivity, fundamental differences arise between both models. A proof of the existence of solutions to the associated optimality equation, under a simultaneous Doeblin condition and for small enough risk-sensitivity coefficient, is given. Our proof relies on the Perron-Frobenius theory of non-negative matrices. An example that shows the impact of risk-sensitivity on the Hernandez-Hernandez condition for the existence of solutions to an optimality inequality is constructed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call