Abstract
We study the limiting behavior of the mixed strategies that result from optimal no-regret learning in a repeated game setting where the stage game is any 2 × 2 competitive game. We consider optimal no-regret algorithms that are mean-based and monotonic in their argument. We show that for any such algorithm, the limiting mixed strategies of the players cannot converge almost surely to any Nash equilibrium. This negative result is also shown to hold under a broad relaxation of these assumptions, including popular variants of Follow-the-Regularized Leader with optimism or adaptive step sizes. Finally, we provide partial evidence that the monotonicity and mean-based assumptions can be removed or relaxed. Our results identify the inherent stochasticity in players’ realizations as a critical factor underlying this divergence, and demonstrate a crucial difference in outcomes between using the opponent’s mixtures and realizations to make updates. Funding: V. Muthukumar was supported by a Simons-Berkeley Research Fellowship, NSF awards IIS-2212182 and CCF-2239151, and generous gifts from Amazon, Adobe, and Google. S. Phade acknowledges the support of the NSF [Grants CNS-1527846 and CCF-1618145] and the NSF Science & Technology Center [Grant CCF-0939370 (Science of Information)]. A. Sahai acknowledges the support of the ML4Wireless center member companies and the NSF [Grants AST-144078 and ECCS-1343398].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.