Abstract

This work considers Markov decision processes with discrete state space. Assuming that the decision maker has a non-null constant risk-sensitivity, which leads to grade random rewards via the expectation of an exponential utility function, the performance index of a control policy is the risk-sensitive expected total-reward criterion corresponding to a nonnegative reward function. Within this framework, the existence of optimal and approximately optimal stationary policies in the absolute sense is studied. The main results can be summarised as follows: (i) An optimal stationary policy exists if the state and actions sets are finite, whereas an e-optimal stationary policy is guaranteed when just the state space is finite. (ii) This latter fact is used to obtain, for the general denumerable state space case, that e-optimal stationary policies exist if the controller is risk-seeking and the optimal value function is bounded.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.