Abstract

This paper formulates a game-theoretic reinforcement learning model based on the stochastic gradient method whereby players start from their initial circumstances with dispersed information, using the expected gradient to update choice propensities, and converge to the predicted equilibrium of belief-based models. Gradient-based reinforcement learning (G-RL) entails a model-free simulation method to estimate the gradient of expected payoff with respect to choice propensities in repeated games. As the gradient points to the steepest direction towards discovering steady-state equilibrium, G-RL provides a theoretical justification for a probability-weighed time-varying updating rule that optimally balances the trade-off between reinforcing past successful strategies (‘exploitation’) and exploring other strategies (‘exploration’) in choosing actions. The effectiveness and stability of G-RL are demonstrated in a simulated call market, where both the actual effect and the foregone effect are simultaneously updated during market equilibration. In contrast, the failure of payoff-based reinforcement learning (P-RL) is due to its constant-sensitivity updating rule, which causes an imbalance between exploitation and exploration in complex environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call