Abstract
This paper is the second part of our study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove that a stationary policy is Blackwell optimal in the class of all history-dependent policies if it is Blackwell optimal in the class of stationary policies.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have