Abstract

In this paper, we have proposed a model-free algorithm, based on sequential decomposition, to obtain optimal policies for. We consider finite horizon with a large population of homogeneous players, sequentially making strategic decisions. Each player observes a private state and a mean-field population state representing the empirical distribution of other players' states. The mean-field state is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">common information</i> among all the players in the game. Authors in [1] provided a sequential decomposition algorithm to compute for such games in linear time than exponential as in prior literature. We extended the idea of sequential decomposition to propose a model-free algorithm for these games using Expected Sarsa in[2]. In this paper, we provide detailed convergence proofs for our algorithm. In addition, we propose an algorithm for with unknown reward functions. The proposed algorithm learns the reward function by studying an expert's behavior and then computes the optimal policy. We illustrate our results using a cyber-physical security example.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call