In this paper, we address the problem of cooperative tracking of multiple heterogeneous targets by deploying multiple and heterogeneous pursuers exhibiting different decision-making capabilities. Initially, under infinite resources, we formulate a game between the evader and the pursuing team, with an evader being the maximizing player and the pursuing team being the minimizing one. Subsequently, we relax the perfect rationality assumption via the use of a level-k thinking framework that allows the evaders to not exhibit the same levels of rationality. Such rationality policies are computed by using a reinforcement learning-based architecture and are proven to form Nash policies as the thinking levels increase. Finally, in the case of multiple pursuers against multiple targets, we develop a switched learning scheme with multiple convergence sets by assigning the most intelligent pursuers to the most intelligent evaders.
Read full abstract