Two-Player Zero-Sum Discounted Games
In this chapter, we extend the notion of discounted payoff to the model of stochastic games, and we define the concept of discounted equilibrium. We then prove that every two-player zero-sum stochastic game admits a discounted value, and that each player has a stationary discounted optimal strategy. The proof uses the same tools we employed in Chapter~\ref{section:mdp} to prove that in Markov decision problems the decision maker has a stationary discounted optimal strategy.
- Conference Article
21
- 10.1109/aucc.2016.7868186
- Nov 1, 2016
In this paper, we consider the problem of inverse dynamic games: given the observed behaviour of players during a dynamic game in equilibrium, how can we determine the underlying objective functions of the game? Whereas previous work in the literature has focused on inverse static games, our work focuses on inverse dynamic games. In particular, we address the problem of estimating the unknown parameters of the objective function of a two-player zero-sum dynamic game in open-loop Nash equilibrium. We exploit necessary conditions for equilibrium in a two-player zero-sum dynamic game to develop sufficient conditions for solving the two-player zero-sum inverse dynamic game problem. The sufficient conditions hold under assumptions on the control constraints and convexity of the game dynamics, and transform the inverse two-player zero-sum dynamic game problem into the problem of solving a system of linear equations. We apply our results to a linear quadratic two-player zero-sum game, and illustrate the recovery of objective function parameters from state and control equilibrium trajectories.
- Research Article
7
- 10.3182/20131218-3-in-2045.00142
- Dec 1, 2013
- IFAC Proceedings Volumes
Online Partially Model-Free Solution of Two-Player Zero Sum Differential Games
- Research Article
7
- 10.1109/access.2021.3092748
- Jan 1, 2021
- IEEE Access
Aiming at the problem of incomplete measurement data in the power system state estimation process, the missing data reconstructed by the residual generative adversarial network (RGAN) was introduced for the power system dynamic estimation. RGAN is a deep learning model based on the idea of a “two-player zero-sum” game, in which two deep neural networks compete with each other to mine the relevant features of the data. Distinguish from existing structure, skip connection and residual network (ResNet) were incorporated into two deep neural networks. And the incomplete measurement data were reconstructed accurately from the remaining measurement data. To lessen the impact of reconstruction errors, the unscented Kalman filter (UKF) method was used to estimate power system state. A case studied in the IEEE 30-bus shows that the UKF based RGAN dynamic estimation can maintain a high accuracy under different proportions of missing data.
- Video Transcripts
- 10.48448/098k-2h54
- Apr 11, 2021
- Underline Science Inc.
Off-policy evaluation (OPE) is the problem of evaluating new policies using historical data obtained from a different policy. In the recent OPE context, most studies have focused on single-player cases, and not on multi-player cases. In this study, we propose OPE estimators constructed by the doubly robust and double reinforcement learning estimators in two-player zero-sum Markov games. The proposed estimators project exploitability that is often used as a metric for determining how close a policy profile (i.e., a tuple of policies) is to a Nash equilibrium in two-player zero-sum games. We prove the exploitability estimation error bounds for the proposed estimators. We then propose the methods to find the best candidate policy profile by selecting the policy profile that minimizes the estimated exploitability from a given policy profile class. We prove the regret bounds of the policy profiles selected by our methods. Finally, we demonstrate the effectiveness and performance of the proposed estimators through experiments.
- Conference Article
11
- 10.1109/cig.2018.8490452
- Aug 1, 2018
Two fundamental problems in computational game theory are computing a Nash equilibrium and learning to exploit opponents given observations of their play (opponent exploitation). The latter is perhaps even more important than the former: Nash equilibrium does not have a compelling theoretical justification in game classes other than two-player zero-sum, and for all games one can potentially do better by exploiting perceived weaknesses of the opponent than by following a static equilibrium strategy throughout the match. The natural setting for opponent exploitation is the Bayesian setting where we have a prior model that is integrated with observations to create a posterior opponent model that we respond to. The most natural, and a well-studied prior distribution is the Dirichlet distribution. An exact polynomial-time algorithm is known for best-responding to the posterior distribution for an opponent assuming a Dirichlet prior with multinomial sampling in normal-form games; however, for imperfect-information games the best known algorithm is based on approximating an infinite integral without theoretical guarantees. We present the first exact algorithm for a natural class of imperfect-information games. We demonstrate that our algorithm runs quickly in practice and outperforms the best prior approaches. We also present an algorithm for a uniform prior.
- Research Article
1
- 10.1609/aaai.v37i5.25679
- Jun 26, 2023
- Proceedings of the AAAI Conference on Artificial Intelligence
Two-player zero-sum "graph games" are central in logic, verification, and multi-agent systems. The game proceeds by placing a token on a vertex of a graph, and allowing the players to move it to produce an infinite path, which determines the winner or payoff of the game. Traditionally, the players alternate turns in moving the token. In "bidding games", however, the players have budgets and in each turn, an auction (bidding) determines which player moves the token. So far, bidding games have only been studied as full-information games. In this work we initiate the study of partial-information bidding games: we study bidding games in which a player's initial budget is drawn from a known probability distribution. We show that while for some bidding mechanisms and objectives, it is straightforward to adapt the results from the full-information setting to the partial-information setting, for others, the analysis is significantly more challenging, requires new techniques, and gives rise to interesting results. Specifically, we study games with "mean-payoff" objectives in combination with "poorman" bidding. We construct optimal strategies for a partially-informed player who plays against a fully-informed adversary. We show that, somewhat surprisingly, the "value" under pure strategies does not necessarily exist in such games.
- Research Article
12
- 10.1016/0004-3702(93)90108-n
- Dec 1, 1993
- Artificial Intelligence
The multi-player version of minimax displays game-tree pathology
- Conference Article
4
- 10.1109/icca.2019.8899568
- Jul 1, 2019
The two-player zero-sum differential game has been extensively studied, partially because its solution implies the H <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">∞</sub> optimality. Existing studies on zero-sum differential games either assume deterministic dynamics or the dynamics corrupted by additive noise. In realistic environments, highdimensional environmental uncertainties often modulate system dynamics in a more complicated fashion. In this paper, we study the stochastic two-player zero-sum differential game governed by more general uncertain linear dynamics. We show that the optimal control policies for this game can be found by solving the Hamilton-Jacobi-Bellman (HJB) equation. We prove that with the derived optimal control policies, the system is asymptotically stable in the mean, and reaches the Nash equilibrium. To solve the stochastic two-player zero-sum game online, we design a new policy iteration (PI) algorithm that integrates the integral reinforcement learning (IRL) and an efficient uncertainty evaluation method-multivariate probabilistic collocation method (MPCM). This algorithm provides a fast online solution for the stochastic two-player zero-sum differential game subject to multiple uncertainties in the system dynamics.
- Book Chapter
- 10.1007/3-540-54563-8_70
- Jan 1, 1991
It is widely believed that by searching deeper in the game tree, the decision-maker is more likely to make a better decision. Dana Nau and others have discovered pathology theorems that show the opposite: searching deeper in the game tree causes the quality of the ultimate decision to become worse, not better. The models for these theorems assume that the search procedure is minimax and the games are two-player zero-sum. This report extends Nau's pathology theorem to multiplayer game trees searched with max n, the multi-player version of minimax. Thus two-player zero-sum game trees and multi-player game trees are shown to have an important feature in common.KeywordsEquilibrium PointPayoff VectorGame TreePruning AlgorithmHeuristic EvaluationThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
- 10.3390/g17010009
- Feb 3, 2026
- Games
There has been significant recent progress in algorithms for approximation of Nash equilibrium in large two-player zero-sum imperfect-information games and exact computation of Nash equilibrium in multiplayer normal-form games. While counterfactual regret minimization and fictitious play are scalable to large games and have convergence guarantees in two-player zero-sum games, they do not guarantee convergence to Nash equilibrium in multiplayer games. We present an approach for exact computation of Nash equilibrium in multiplayer imperfect-information games that solves a quadratically-constrained program based on a nonlinear complementarity problem formulation from the sequence-form game representation. This approach capitalizes on recent advances for solving nonconvex quadratic programs. Our algorithm is able to quickly solve three-player Kuhn poker after removal of dominated actions. Of the available algorithms in the Gambit software suite, only the logit quantal response approach is successfully able to solve the game; however, the approach takes longer than our algorithm and also involves a degree of approximation. Our formulation also leads to a new approach for computing Nash equilibrium in multiplayer normal-form games which we demonstrate to outperform a previous quadratically-constrained program formulation.
- Research Article
225
- 10.1109/tsmc.2016.2531680
- Jul 1, 2017
- IEEE Transactions on Systems, Man, and Cybernetics: Systems
In this paper, the ${H}_{\infty }$ optimal control problem for a class of continuous-time nonlinear systems is investigated using event-triggered method. First, the ${H}_{\infty }$ optimal control problem is formulated as a two-player zero-sum (ZS) differential game. Then, an adaptive triggering condition is derived for the ZS game with an event-triggered control policy and a time-triggered disturbance policy. The event-triggered controller is updated only when the triggering condition is not satisfied. Therefore, the communication between the plant and the controller is reduced. Furthermore, a positive lower bound on the minimal intersample time is provided to avoid Zeno behavior. For implementation purpose, the event-triggered concurrent learning algorithm is proposed, where only one critic neural network (NN) is used to approximate the value function, the control policy and the disturbance policy. During the learning process, the traditional persistence of excitation condition is relaxed using the recorded data and instantaneous data together. Meanwhile, the stability of closed-loop system and the uniform ultimate boundedness (UUB) of the critic NN’s parameters are proved by using Lyapunov technique. Finally, simulation results verify the feasibility to the ZS game and the corresponding ${H}_{\infty }$ control problem.
- Conference Article
- 10.65109/tdox4540
- May 3, 2021
Off-policy evaluation (OPE) is the problem of evaluating new policies using historical data obtained from a different policy. In the recent OPE context, most studies have focused on single-player cases, and not on multi-player cases. In this study, we propose OPE estimators constructed by the doubly robust and double reinforcement learning estimators in two-player zero-sum Markov games. The proposed estimators project exploitability that is often used as a metric for determining how close a policy profile (i.e., a tuple of policies) is to a Nash equilibrium in two-player zero-sum games. We prove the exploitability estimation error bounds for the proposed estimators. We then propose the methods to find the best candidate policy profile by selecting the policy profile that minimizes the estimated exploitability from a given policy profile class. We prove the regret bounds of the policy profiles selected by our methods. Finally, we demonstrate the effectiveness and performance of the proposed estimators through experiments.
- Conference Article
- 10.1109/appeec53445.2022.10072084
- Nov 20, 2022
Wind energy is major contributor in the power system. However, the unpredictability of wind energy will have a substantial impact on the electrical grid, mostly because of the variable wind speed. Wind energy's consistency and dependability may cause instability, however the issue can be solved by scheduling generation and load. For that economical load dispatch planning is carried out by load dispatch centers. Wind power forecasts can be highly helpful for dispatch planning as well as selling and bidding in the energy market. Prediction can be done using different techniques like Numerical Weather Prediction (NWP), Artificial Intelligence and Machine Learning, time series analysis etc. Prediction using machine learning is incredibly accurate and quick compared to other techniques, particularly when employing Generative Adversarial Network. It is inspired by two-player zero-sum, where the Generator and Discriminator compete against each other. Gated recurrent Unit (GRU) based adversarial networks have fewer errors as compared to others.
- Research Article
1
- 10.1609/aaai.v39i13.33549
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Strategic interactions can be represented more concisely, and analyzed and solved more efficiently, if we are aware of the symmetries within the multiagent system. Symmetries also have conceptual implications, for example for equilibrium selection. We study the computational complexity of identifying and using symmetries. Using the classical framework of normal-form games, we consider game symmetries that can be across some or all players and/or actions. We find a strong connection between game symmetries and graph automorphisms, yielding graph automorphism and graph isomorphism completeness results for characterizing the symmetries present in a game. On the other hand, we also show that the problem becomes polynomial-time solvable when we restrict the consideration of actions in one of two ways. Next, we investigate when exactly game symmetries can be successfully leveraged for Nash equilibrium computation. We show that finding a Nash equilibrium that respects a given set of symmetries is PPAD- and CLS-complete in general-sum and team games respectively---that is, exactly as hard as Brouwer fixed point and gradient descent problems. Finally, we present polynomial-time methods for the special cases where we are aware of a vast number of symmetries, or where the game is two-player zero-sum and we do not even know the symmetries.
- Research Article
29
- 10.1109/access.2019.2931604
- Jan 1, 2019
- IEEE Access
Controlling the polarization states of transmit waveforms can improve the performance of radar systems, especially for main lobe jamming suppression applications. In this paper, we consider the design of optimal transmit polarizations for deceptive jamming suppression in the main lobe using a game theory framework. We propose a co-located polarization multiple-input multiple-output (MIMO) radar system that combines the advantages of MIMO radar and those offered by optimally choosing the transmit polarization to improve the jamming suppression performance. In the polarization MIMO radar, polarization diversity is employed in the transmit array, and 2-D vector sensors are adopted in the receive array to separately measure the horizontal and vertical components of the received signals. Furthermore, based on the concepts and advantages of game theory, we formulate a polarization design problem for this radar system as a two-player zero-sum (TPZS) game between the radar and jammers. Additionally, we propose two design methods for different cases: a unilateral game for dumb jammers, and a strategic game for smart jammers. The optimal strategy and Nash equilibrium solution for two cases are presented. The simulation results demonstrate that jamming can be effectively suppressed with the proposed radar configuration and that improved jamming suppression performance can be achieved when the transmit polarization scheme is designed using the game theory approach.