Abstract

The use of the long-run average reward or the gain as an optimally criterion has received considerable attention in the literature. However, for many practical models the gain has the undesirable property of being underselective, that is, there may be several gain optimal policies. After finding the set of policies that achieve the primary objective of maximizing the long-run average reward one might search for that which maximizes the “short-run” reward. This reward, called the bias aids in distinguishing among multiple gain optimal policies. This chapter focuses on establishing the usefulness of the bias in distinguishing among multiple gain optimal policies, computing it and demonstrating the implicit discounting captured by bias on recurrent states.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call