Abstract

Focuses on bias optimality in unichain, finite state, and action-space Markov decision processes. Using relative value functions, we present methods for evaluating optimal bias, this leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call