Abstract

This paper is the second part of our study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove that a stationary policy is Blackwell optimal in the class of all history-dependent policies if it is Blackwell optimal in the class of stationary policies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call