In this study, we considered the problem of inverse reinforcement learning or estimating the cost function of expert players in multi-player differential games. We proposed two online data-driven solutions for linear–quadratic games that are applicable to systems that fulfill a specific dimension criterion or whose unknown matrices in the cost function conform to a diagonal condition. The first method, which is partially model-free, utilizes the trajectories of expert agents to solve the problem. The second method is entirely model-free and employs the trajectories of both expert and learner agents. We determined the conditions under which the solutions are applicable and identified the necessary requirements for the collected data. We conducted numerical simulations to establish the effectiveness of the proposed methods.
Read full abstract