Two-player zero-sum stochastic differential games with Markov-switching jump-diffusion dynamics
Two-player zero-sum stochastic differential games with Markov-switching jump-diffusion dynamics
- Conference Article
3
- 10.1109/icca.2019.8899568
- Jul 1, 2019
The two-player zero-sum differential game has been extensively studied, partially because its solution implies the $H_{\infty}$ optimality. Existing studies on zero-sum differential games either assume deterministic dynamics or the dynamics corrupted by additive noise. In realistic environments, high-dimensional environmental uncertainties often modulate system dynamics in a more complicated fashion. In this paper, we study the stochastic two-player zero-sum differential game governed by more general uncertain linear dynamics. We show that the optimal control policies for this game can be found by solving the Hamilton-Jacobi-Bellman (HJB) equation. We prove that with the derived optimal control policies, the system is asymptotically stable in the mean, and reaches the Nash equilibrium. To solve the stochastic two-player zero-sum game online, we design a new policy iteration (PI) algorithm that integrates the integral reinforcement learning (IRL) and an efficient uncertainty evaluation method—multivariate probabilistic collocation method (MPCM). This algorithm provides a fast online solution for the stochastic two-player zero-sum differential game subject to multiple uncertainties in the system dynamics.
- Book Chapter
- 10.1007/978-4-431-55123-2_4
- Oct 20, 2014
In this chapter, we will deal with zero-sum two-player time-homogeneous stochastic differential games and viscosity solutions of the Isaacs equations arising from such games, via the dynamic programming principle.In Sect. 4.1, we are concerned with basic concepts and definitions and we introduce stochastic differential games, referring to (Controlled MarkovProcesses and viscosity solutions, 2nd edn. Springer, New York 2006), XI. Then, using a semi-discretization argument, we study the DPP for lower- and upper-value functions in Sect. 4.2. In Sect. 4.3, we will consider the Isaacs equations, via semigroups related to DPP. In Sect. 4.4, we consider a link between stochastic controls and differential games via risk sensitive controls.
- Conference Article
14
- 10.1109/aucc.2016.7868186
- Nov 1, 2016
In this paper, we consider the problem of inverse dynamic games: given the observed behaviour of players during a dynamic game in equilibrium, how can we determine the underlying objective functions of the game? Whereas previous work in the literature has focused on inverse static games, our work focuses on inverse dynamic games. In particular, we address the problem of estimating the unknown parameters of the objective function of a two-player zero-sum dynamic game in open-loop Nash equilibrium. We exploit necessary conditions for equilibrium in a two-player zero-sum dynamic game to develop sufficient conditions for solving the two-player zero-sum inverse dynamic game problem. The sufficient conditions hold under assumptions on the control constraints and convexity of the game dynamics, and transform the inverse two-player zero-sum dynamic game problem into the problem of solving a system of linear equations. We apply our results to a linear quadratic two-player zero-sum game, and illustrate the recovery of objective function parameters from state and control equilibrium trajectories.
- Conference Article
5
- 10.1109/wcica.2012.6359061
- Jul 1, 2012
Complex systems with components or subsystems having game-like relationships are probably the most complex ones that we encounter everyday. Much progress has been made over the past half century on differential games which are used as a tool in modeling conflicts in the context of dynamic systems, however, almost all of the current literature assume that both the parameters and the structure of the game are known to the players. Since in many practical situations, the players may have unknown parameters, which motivate us to investigate a class of two-player zero-sum linear-quadratic stochastic differential games in [1] with unknown parameters. In this paper, we will further consider a class of two-player nonzero-sum linear quadratic stochastic differential games, with unknown parameters to both players. We will design adaptive strategies and prove that they will converge to the optimal ones under some natural conditions on the true parameters of the system.
- Conference Article
1
- 10.1109/cdc.2007.4434763
- Jan 1, 2007
Due to the increasing use of autonomous aerial vehicles in military applications such as search, surveillance and reconnaissance of mobile targets, the study of (mufti-player) Pursuit-Evasion (PE) differential game has revived. As a basis for studying mufti-player games, we focus on a two-player stochastic PE game. Based on the classical differential game theory, we derive the analytical value function for a two-player stochastic PE game using the Dubin’s car model. Moreover, we address the issue of the finite expectation of the capture time in both the perfect and the imperfect state information cases. Sufficient conditions are provided in terms of the players’ speeds, measurement accuracy and the capture range of the pursuer. These results can provide insight for stochastic PE game problems and are useful for solving multi-player games.
- Conference Article
11
- 10.1109/cdc.2011.6160768
- Dec 1, 2011
Complex systems with components or subsystems having game-like relationships are arguably the most complex ones. Much progress has been made in the traditional game theory over the past half a century, where the structure and the parameters are assumed to be known when the players make their decisions. However this is not the case in many practical situations where the players may have unknown parameters. To initiate a theoretical study of such problems, we consider in this paper a class of two-player zero-sum linear-quadratic stochastic differential games, assuming that the matrices associated with the strategies of the players are unknown to both players. By using the weighted least squares (WLS) estimation algorithms and a random regularization method, adaptive strategies will be constructed for both players. It is shown that both the adaptive strategies will converge to the optimal ones under some natural conditions on the true parameters of the system. To the best of our knowledge, this work seems to be the first to address adaptive stochastic differential game problems with rigorous convergence analysis.
- Research Article
13
- 10.1214/09-aop494
- Mar 1, 2010
- The Annals of Probability
Given a bounded $\mathcaligr{C}^2$ domain $G\subset{\mathbb{R}}^m$, functions $g\in\mathcaligr{C}(\partial G,{\mathbb{R}})$ and $h\in\mathcaligr {C}(\bar{G},{\mathbb{R}}\setminus\{0\})$, let $u$ denote the unique viscosity solution to the equation $-2\Delta_{\infty}u=h$ in $G$ with boundary data $g$. We provide a representation for $u$ as the value of a two-player zero-sum stochastic differential game.
- Book Chapter
4
- 10.1007/978-3-319-50815-3_11
- Jan 1, 2017
In this chapter, differential games are studied for continuous-time linear and nonlinear systems, including two-player zero-sum games, multi-player zero-sum games, and multi-player nonzero-sum games, via a series of adaptive dynamic programming (ADP) approaches. First, an integral policy iteration algorithm is developed to learn online the Nash equilibrium of two-player zero-sum differential games with completely unknown continuous-time linear dynamics using the state and control data. Second, multi-player zero-sum differential games for a class of continuous-time uncertain nonlinear systems are solved by using a novel iterative ADP algorithm. Via neural network modeling for the system dynamics, the ADP technique is employed to obtain the optimal control pair iteratively so that the iterative value function reaches the optimal solution of the zero-sum differential games. Finally, an online synchronous approximate optimal learning algorithm based on policy iteration is developed to solve multi-player nonzero-sum games of continuous-time nonlinear systems without the requirement of exact knowledge of system dynamics.
- Research Article
7
- 10.3182/20131218-3-in-2045.00142
- Dec 1, 2013
- IFAC Proceedings Volumes
Online Partially Model-Free Solution of Two-Player Zero Sum Differential Games
- Conference Article
50
- 10.5555/1838206.1838229
- May 10, 2010
Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with imperfect information. For two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an e-Nash equilibrium. A Nash equilibrium strategy is useful in two-player games since it maximizes its utility against a worst-case opponent. However, for multiplayer (three or more player) games, we lose all theoretical guarantees for CFR. However, we believe that CFR-generated agents may perform well in multiplayer games. To test this hypothesis, we used this technique to create several 3-player limit Texas Hold'em poker agents and two of them placed first and second in the 3-player event of the 2009 AAAI/IJCAI Computer Poker Competition. We also demonstrate that good strategies can be obtained by grafting sets of two-player subgame strategies to a 3-player base strategy after one of the players is eliminated.
- Research Article
19
- 10.1109/tnet.2011.2176511
- Jun 1, 2012
- IEEE/ACM Transactions on Networking
Anonymous wireless networking is studied when an adversary monitors the transmission timing of an unknown subset of the network nodes. For a desired quality of service (QoS), as measured by network throughput, the problem of maximizing anonymity is investigated from a game-theoretic perspective. Quantifying anonymity using conditional entropy of the routes given the adversary's observation, the problem of optimizing anonymity is posed as a two-player zero-sum game between the network designer and the adversary: The task of the adversary is to choose a subset of nodes to monitor so that anonymity of routes is minimum, whereas the task of the network designer is to maximize anonymity by choosing a subset of nodes to evade flow detection by generating independent transmission schedules. In this two-player game, it is shown that a unique saddle-point equilibrium exists for a general category of finite networks. At the saddle point, the strategy of the network designer is to ensure that any subset of nodes monitored by the adversary reveals an identical amount of information about the routes. For a specific class of parallel relay networks, the theory is applied to study the optimal performance tradeoffs and equilibrium strategies. In particular, when the nodes employ transmitter-directed signaling, the tradeoff between throughput and anonymity is characterized analytically as a function of the network parameters and the fraction of nodes monitored. The results are applied to study the relationships between anonymity, the fraction of monitored relays, and the fraction of hidden relays in large networks.
- Research Article
23
- 10.1016/s0167-9236(99)00074-3
- Mar 1, 2000
- Decision Support Systems
Foresight-based pricing algorithms in agent economies
- Research Article
6
- 10.1016/j.ic.2016.10.012
- Nov 3, 2016
- Information and Computation
Doomsday equilibria for omega-regular games
- Research Article
18
- 10.1007/s00245-018-9529-2
- Sep 24, 2018
- Applied Mathematics & Optimization
This paper considers the problem of two-player zero-sum stochastic differential game with both players adopting impulse controls in finite horizon under rather weak assumptions on the cost functions ($c$ and $\chi$ not decreasing in time). We use the dynamic programming principle and viscosity solutions approach to show existence and uniqueness of a solution for the Hamilton-Jacobi-Bellman-Isaacs (HJBI) partial differential equation (PDE) of the game. We prove that the upper and lower value functions coincide.
- Research Article
54
- 10.1137/120880094
- Jan 1, 2013
- SIAM Journal on Control and Optimization
We study a two-player zero-sum stochastic differential game with both players adopting impulse controls, on a finite time horizon. The Hamilton--Jacobi--Bellman--Isaacs (HJBI) partial differential equation (PDE) of the game turns out to be a double-obstacle quasi-variational inequality; therefore the two obstacles are implicitly given. We prove that the upper and lower value functions coincide; indeed we show, by means of the dynamic programming principle for the stochastic differential game, that they are the unique viscosity solution to the HJBI equation, therefore proving that the game admits a value.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.