Abstract

Two-player zero-sum games of infinite duration and their quantitative versions are used in verification to model the interaction between a controller (Eve) and its environment (Adam). The question usually addressed is that of the existence (and computability) of a strategy for Eve that can maximize her payoff against any strategy of Adam. In this work, we are interested in strategies of Eve that minimize her regret, i.e. strategies that minimize the difference between her actual payoff and the payoff she could have achieved if she had known the strategy of Adam in advance. We give algorithms to compute the strategies of Eve that ensure minimal regret against an adversary whose choice of strategy is (1) unrestricted, (2) limited to positional strategies, or (3) limited to word strategies, and show that the two last cases have natural modelling applications. These results apply for quantitative games defined with the classical payoff functions $$\mathsf {Inf}$$Inf, $$\mathsf {Sup}$$Sup, $${\mathsf {LimInf}}$$LimInf, $$\mathsf {LimSup}$$LimSup, and mean-payoff. We also show that our notion of regret minimization in which Adam is limited to word strategies generalizes the notion of good for games introduced by Henzinger and Piterman, and is related to the notion of determinization by pruning due to Aminof, Kupferman and Lampert.

Highlights

  • The model of two player games played on graphs is an adequate mathematical tool to solve important problems in computer science, and in particular the reactive-system synthesis problem [26]

  • We show that our notion of regret minimization for word strategies extends this notion to the quantitative setting (Proposition 3)

  • A strategy for Eve (Adam) is a function σ that maps partial plays ending with a vertex v in V∃ (V \V∃) to a successor of v

Read more

Summary

Introduction

The model of two player games played on graphs is an adequate mathematical tool to solve important problems in computer science, and in particular the reactive-system synthesis problem [26]. Such a situation can be modelled by an arena in which choices in nodes of the environment model an entire family of environments and each memoryless strategy models a specific environment of the family In such cases, if we want to design a controller that performs reasonably well against all the possible environments, we can consider a controller that minimizes regret: the strategy of the controller will be as close as possible to an optimal strategy if we had known the environment beforehand. If we want to design a controller that performs reasonably well against all the possible environments, we can consider a controller that minimizes regret: the strategy of the controller will be as close as possible to an optimal strategy if we had known the environment beforehand This is, for example, the modelling choice done in the famous.

Contributions
Related works
Preliminaries
Payoff functions
Regret
Variant I
Upper bounds
Lower bounds
Memory requirements for Eve and Adam
Variant II
Memory requirements for Eve
Variant III
Additional definitions
Fixed memory for Eve
Relation to other works
Discussion
For MP
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call