Endogenous Learning with Bounded Memory

Yilmaz Kocer

doi:10.2139/ssrn.1743139

Abstract

I analyze the effects of memory limitations on the endogenous learning behavior of an agent in a standard two-armed bandit problem. An infinitely lived agent chooses each period between two alternatives with unknown types, to maximize discounted payoffs. The agent can experiment with each alternative and receive payoffs that are partially informative about its type. The agent does not recall past actions or payoffs. Instead, the agent has a finite number of memory states as in Wilson (2004): he can condition his actions only on the memory state he is currently in, and he can update his memory state depending on the payoff received.I find that the inclination to choose the currently better alternative does not constrain learning in the limit as discounting vanishes. Even though uncertainties are independent, the agent optimally holds correlated beliefs across memory states. Optimally, memory states reflect the magnitude of the relative ranking of alternatives. After a high payoff from one of the alternatives, the agent optimally moves to a memory state with more pessimistic beliefs on the other, even though no information about the latter alternative is received. For the case where one alternative is substantially more informative than the other, he chooses the latter only for myopic exploitation purposes, and ignores any information about it, suggesting specialization in learning.For the special case with one known (safe) alternative, a sufficiently patient agent never ceases experimentation and tries the unknown alternative at least occasionally after any history. Furthermore, he chooses the safe alternative with more optimistic beliefs than the optimal full memory cutoff belief, suggesting under-experimentation. Both are counter to what theory predicts with full memory, but in agreement with experimental findings.

Full Text