We test whether deviations from Nash equilibrium in rent-seeking contests can be explained by the slow convergence of payoff-based learning. We identify and eliminate two sources of noise that slow down learning. The first source of noise is present because each action is evaluated against a different sample of actions of other players. We eliminate it by providing foregone payoff information, which allows all actions to be evaluated against the same sequence of opponent's actions. The second source of noise is present because of payoff risk, which reduces the correlation between expected and realized payoffs. We manipulate payoff risk using a 2x2 design: payoffs from contest investments are either risky (as in standard contests) or safe (as in proportional contests), and payoffs from the part of endowment not invested in the contest are also either safe (as in standard contests) or risky. We find that Nash equilibrium rates go up to 100% when payoff risk is not present and foregone payoff information is available, but are at most 20% in all other cases. This result can be explained by payoff-based learning but not by other theories that might interact with payoff risk (non-monetary utility of winning, risk-seeking preferences, spitefulness, probability weighting, QRE). We propose a hybrid learning model that combines reinforcement and belief learning with preferences, and show that it fits data well, mostly because of reinforcement learning. Additional support for learning comes from the persistence of the Nash equilibrium following the removal of foregone payoff information.
Read full abstract