Abstract
Many economic decision-makers today rely on learning algorithms for important decisions. This paper shows that a widely used learning algorithm—ε-Greedy—exhibits emergent risk aversion, favoring actions with lower payoff variance. When presented with actions of the same expectated payoff, under a wide range of conditions, ε-Greedy chooses the lower-variance action with probability approaching one. This emergent preference can have wide-ranging consequences, from inequity to homogenization, and holds transiently even when the higher-variance action has a strictly higher expected payoff. We discuss two methods to restore risk neutrality. The first method reweights data as a function of how likely an action is chosen. The second method employs optimistic payoff estimates for actions that have not been taken often.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have