Behavioural analysis of independent value-based learning in non-cooperative games

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Multi-agent reinforcement learning has received increased attention in cooperative games. However, research in non-cooperative games is lagging behind. Independent value-based learning algorithms have demonstrated simplicity and versatility in various contexts. In this paper, we study the behavior of these algorithms in non-cooperative settings. We explain the conditions that a game must satisfy for the algorithms to work. We further test the algorithms in our proposed game Food Chain that simulates an ecosystem. Our results show that independent value-based learning algorithms can converge to Nash equilibrium, only when the Nash equilibrium consists of uniformly random policies over the feasible actions.

Similar Papers
  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.physa.2019.122484
A differential game analysis of multipollutant transboundary pollution in river basin
  • Aug 23, 2019
  • Physica A: Statistical Mechanics and its Applications
  • Huiquan Li + 1 more

A differential game analysis of multipollutant transboundary pollution in river basin

  • Book Chapter
  • Cite Count Icon 650
  • 10.1007/978-3-030-60990-0_12
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
  • Jan 1, 2021
  • Kaiqing Zhang + 2 more

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

  • Research Article
  • 10.1007/s40797-025-00344-3
Optimal Monetary Policy and Asset Market Shocks Under Cooperative and Non-cooperative Games
  • Nov 11, 2025
  • Italian Economic Journal
  • Christos Ioannidis + 1 more

In this paper we construct a model of a policy game in order to analyse the optimal reaction function of the Central Bank to a shock in the asset market. In doing so, we consider a cooperative game and three different non-cooperative games: Nash equilibrium, Stackelberg equilibria (with two large economies, e.g. country “a” and “c”, with an accommodate and conservative central bank respectively and different games in which we assume that both central banks react to a shock in their asset markets. Three major conclusions can be drawn from our work in the presence of asset market shocks. First, in a cooperative and non-cooperative game framework, the optimal monetary policy assuming that the central bank considers the information from the asset market. In particular, we examined the impact of shocks in the asset markets and Phillips curve on domestic and foreign monetary policy. For the simulation we used true data for two big countries, USA and Europe. In our view the monetary authorities of these two countries can better represent the “conservative” and the “accommodate” central bank assumed in the model where, of course, the first is associated with the European Central Bank (ECB) and the second with the Federal Reserve (FED). The results from the impulse response functions show that, following an unexpected increase of the US asset market, the patterns of the output responses are similar in both countries that is, a positive shock in the USA stock market increases output gaps. Different responses are obtained when we consider the effects on the inflation in the two countries. Despite the fact that the patterns are similar, the magnitude of the impact is slightly different. Moreover, following an unexpected increase of the European stock market, the patterns of the output responses are dissimilar in the two countries that is, a positive shock in the EU stock market increases output gap in the USA while, for the first six months it has a negative impact on the EU output gap. Different responses are also obtained when we consider the effects on the inflation in the two countries. Finally, we found that, under a cooperative game, the FED minimise its loss functions in all the four potential shocks we examined. Different is the situation for the ECB where, it minimises its loss function with tighter monetary policy only in the cooperative scenario with a shock in the EU stock market and when it acts as follower in the Stackelberg game with a Phillips curve shock.

  • Book Chapter
  • 10.62311/nesx/932144
Game Theory and Decision Science: Innovative Approaches to Strategic Analysis and Optimization
  • Aug 29, 2024
  • Murali Krishna Pasupuleti

Abstract: This chapter explores the integration of game theory and decision science as innovative tools for strategic analysis and optimization. It provides an in-depth examination of core concepts such as Nash Equilibrium, cooperative and non-cooperative games, and sequential games, alongside advanced topics like evolutionary game theory and mechanism design. The chapter also delves into decision-making under uncertainty, utilizing probabilistic models, utility theory, and scenario planning to optimize decisions in complex environments. Through case studies and real-world applications in business strategy, public policy, and international relations, the chapter highlights the practical implications of these theories. The discussion is rounded out with an exploration of ethical considerations and emerging trends, emphasizing the need for continuous innovation and interdisciplinary collaboration in the evolving fields of game theory and decision science. Keywords: Game Theory, Decision Science, Strategic Analysis, Optimization, Nash Equilibrium, Cooperative Games, Non-Cooperative Games, Sequential Games, Evolutionary Game Theory, Mechanism Design, Probabilistic Models, Utility Theory, Scenario Planning, Business Strategy, Public Policy, International Relations, Ethical Considerations, Emerging Trends.

  • Research Article
  • 10.3390/g16040039
Non-Cooperative Representations of Cooperative Games
  • Aug 8, 2025
  • Games
  • Justin Chan

Non-cooperative games in normal form are specified by a player set, sets of player strategies, and payoff functions. Cooperative games, meanwhile, are specified by a player set and a worth function that maps coalitions of players to payoffs they can feasibly achieve. Although these games study distinct aspects of social behavior, this paper proposes a novel attempt at relating the two models. In particular, cooperative games may be represented by a non-cooperative game in which players can freely sign binding agreements to form coalitions. These coalitions inherit a joint strategy set and seek to maximize collective payoffs. When these coalitions play against one another, the equilibrium payoffs for each coalition coincide with what is predicted by the worth function. This paper proves sufficient conditions under which cooperative games can be represented by non-cooperative games. This paper finds that all strictly superadditive partition function form (PFF) games can be represented under Nash equilibrium (NE) and rationalizability; that all weakly superadditive characteristic function form (CFF) games can be represented under NE; and that all weakly superadditive PFF games can be represented under trembling hand perfect equilibrium (THPE).

  • Research Article
  • Cite Count Icon 13
  • 10.7717/peerj-cs.410
Equilibrial service composition model in Cloud manufacturing (ESCM) based on non-cooperative and cooperative game theory for healthcare service equipping
  • Mar 1, 2021
  • PeerJ Computer Science
  • Ehsan Vaziri Goudarzi + 4 more

Industry 4.0 is the digitalization of the manufacturing systems based on Information and Communication Technologies (ICT) for developing a manufacturing system to gain efficiency and improve productivity. Cloud Manufacturing (CM) is a paradigm of Industry 4.0. Cloud Manufacturing System (CMS) considers anything as a service. The end product is developed based on the service composition in the CMS according to consumers’ needs. Also, composite services are developed based on the interaction of MCS providers from different geographical locations. Therefore, the appropriate Manufacturing Cloud Service (MCS) composition is an important problem based on the real-world conditions in CMS. The game theory studies the mathematical model development based on interactions between MCS providers according to real-world conditions. This research develops an Equilibrial Service Composition Model in Cloud Manufacturing (ESCM) based on game theory. MCS providers and consumers get benefits mutually based on ESCM. MCS providers are players in the game. The payoff function is developed based on a profit function. Also, the game strategies are the levels of Quality of Service (QoS) based on consumers’ needs in ESCM. Firstly, the article develops a composite service based on a non-cooperative game. The Nash equilibrium point demonstrates the QoS value of composite service and the payoff value for the players. Secondly, the article develops a composite service based on a cooperative game. The players participate in coalitions to develop the composite service based on formal cooperation. The grand coalition demonstrates the QoS value of composite service and the payoff value for the players in the cooperative game. The research has compared the games’ results. The players’ payoff and the QoS value are better in the cooperative game than in the non-cooperative game. Therefore, the MCS providers and consumers are satisfied mutually in the cooperative game based on ESCM. Finally, the article has applied ESCM in a Healthcare Service to equip 24 hospitals in the best time.

  • Conference Instance
  • Cite Count Icon 99
  • 10.5555/3237383.3237408
Learning with Opponent-Learning Awareness
  • Jul 9, 2018
  • Jakob Foerster + 5 more

Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical reinforcement learning, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes an additional term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Preliminary results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma (IPD), while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to infinitely repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents can successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the likelihood ratio policy gradient estimator, making the method suitable for model-free reinforcement learning. This method thus scales to large parameter and input spaces and nonlinear function approximators. We also apply LOLA to a grid world task with an embedded social dilemma using deep recurrent policies and opponent modelling. Again, by explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neucom.2024.128170
Reinforcement learning with predefined and inferred reward machines in stochastic games
  • Jul 14, 2024
  • Neurocomputing
  • Jueming Hu + 4 more

Reinforcement learning with predefined and inferred reward machines in stochastic games

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.18869/acadpub.jemr.6.24.83
Strategic Interaction Between Government and Central Bank in Framework of Cooperative and Non-Cooperative Games
  • Sep 1, 2016
  • Journal of Research in Economic Modeling
  • Davoud Mahmoudinia + 4 more

In this paper we analyzed the strategic interaction between government and central bank in Iranian economy. Using dynamic differential games and Nash equilibrium within cooperative and non-cooperative setting, we try to find the optimal values of debt, deficit and monetary base. The results of simulation show that in cooperative case the level of equilibrium debt is lower than the non-cooperative case and converge speed is higher in cooperative setting than non-cooperation setting. Also in cooperative case than non-cooperative case, less creation of money and less government deficit are needed for debt stabilization in long run. The results also show that in both cooperative and non-cooperative cases under uncertainty, more active policies are used to track debt to its equilibrium level. These active policies lead debt goes to smaller level.

  • Book Chapter
  • 10.1016/b978-0-323-91870-1.00006-9
6 - Cross-border resource management: methods
  • Jan 1, 2021
  • Cross-Border Resource Management
  • Rongxing Guo

6 - Cross-border resource management: methods

  • Book Chapter
  • Cite Count Icon 1
  • 10.1016/b978-0-444-64002-4.00005-2
Chapter 5 - Cross-Border Resource Management: Methods
  • Oct 21, 2017
  • Cross-Border Resource Management
  • Rongxing Guo

Chapter 5 - Cross-Border Resource Management: Methods

  • Research Article
  • Cite Count Icon 1
  • 10.12988/ces.2014.411220
Cooperative and non-cooperative games for spectrum sharing in cognitive radio networks: a comparative study
  • Jan 1, 2014
  • Contemporary Engineering Sciences
  • Shelly Salim + 3 more

Spectrum sharing in cognitive radio networks is essential to ensure effective communication between secondary users. Game theory is suitable to be applied to the spectrum sharing strategies since it considers strategic interactions between users. There are two types of games based on the ability to communicate between users: cooperative game and non-cooperative game. In this paper, the two spectrum sharing methods using cooperative and non-cooperative games are analyzed and compared. The numerical analysis shows that both the cooperative game and the non-cooperative game have their own best operation environment, in terms of the secondary user population.

  • Research Article
  • Cite Count Icon 10
  • 10.1360/ssi-2020-0180
Review of the progress of communication-based multi-agent reinforcement learning
  • May 1, 2022
  • SCIENTIA SINICA Informationis
  • 涵 王 + 2 more

Reinforcement learning (RL) technology has been successfully applied to various continuous decision environments in decades of development. Nowadays, RL is attracting more attention, even being touted as one of the closest approaches to general artificial intelligence. However, real-world problems often involve multiple intelligent agents interacting with each other. Thus, we focus on multi-agent reinforcement learning (MARL) to deal with such multi-agent systems in practice. In the past decade, the combination of multi-agent system and RL has become increasingly close, gradually forming and enriching the research field of MARL. Reviewing the studies on MARL, we found that researchers mainly solve MARL problems from three perspectives: learning framework, joint action learning, and communication-based MARL. In this paper, we focus from the studies on the communication perspective. We first state the reasons for choosing communication-based MARL and then list the president studies falling into the MARL category but different in nature. We hope that this article can provide a reference for developing MARL methods that can solve practical problems for the national welfare.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/fuzzy.1999.790129
Differential games for nonlinear stochastic systems: a fuzzy approach
  • Jan 1, 1999
  • Huey-Jian Uang + 2 more

A fuzzy differential game theory is proposed to solve the N-person (or N-player) nonlinear differential noncooperative and cooperative (team) game problems, which are not easily tackled by the conventional methods. In this study, both noncooperative and cooperative quadratic differential game are considered. First, the nonlinear stochastic system is approximated by a stochastic fuzzy model. Based on the stochastic fuzzy model, a fuzzy observer-based controller is proposed to deal with the noncooperative differential game in the sense of Nash equilibrium strategies or cooperative (team) game in the sense of Pareto-optimal strategies. Using the suboptimal approach, the outcome of the fuzzy differential game for both noncooperative and cooperative game is parameterized in terms of the eigenvalue problem (EVP). Linear matrix inequality (LMI) techniques are employed to solve these problems from the convex optimization perspective.

  • Research Article
  • 10.13053/cys-25-3-3998
Modelling and Verification Analysis of Cooperative and Non-Cooperative Games via a Modal Logic Approach
  • Sep 12, 2021
  • Computación y Sistemas
  • Zvi Retchkiman Königsberg

In game theory, a cooperative game (or coalitional game) is a game with competition between groups of players (coalitions) due to the possibility of external enforcement of cooperative behavior (e.g. through contract law). Those are opposed to non-cooperative games in which there is either no possibility to forge alliances or all agreements need to be self-enforcing (e.g. through credible threats). Cooperative games are often analyzed through the framework of cooperative game theory, which focuses on predicting which coalitions will form, the joint actions that groups take and the resulting collective payoffs. It is opposed to the traditional non-cooperative game theory which focuses on predicting individual players’ actions and payoffs and analyzing Nash equilibriums. In this work, the cooperative and non-cooperative game problem is modeled by means of a modal logic formula. Then, using the concept of logic implication, and transforming this logical implication relation into a set of clauses, a modal resolution qualitative method for verification (satisfiability) as well as performance issues, for some queries is applied.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.