In the downlink communication, it is currently challenging for ground users to cope with the uncertain interference from aerial intelligent jammers. The cooperation and competition between ground users and unmanned aerial vehicle (UAV) jammers leads to a Markov game problem of anti-UAV jamming. Therefore, a model-free method is adopted based on multi-agent reinforcement learning (MARL) to handle the Markov game. However, the benchmark MARL strategies suffer from dimension explosion and local optimal convergence. To solve these issues, a novel event-triggered multi-agent proximal policy optimization algorithm with Beta strategy (ETMAPPO) is proposed in this paper, which aims to reduce the dimension of information transmission and improve the efficiency of policy convergence. In this event-triggering mechanism, agents can learn to obtain appropriate observation in different moment, thereby reducing the transmission of valueless information. Beta operator is used to optimize the action search. It expands the search scope of policy space. Ablation simulations show that the proposed strategy achieves better global benefits with fewer dimension of information than benchmark algorithms. In addition, the convergence performance verifies that the well-trained ETMAPPO has the capability to achieve stable jamming strategies and stable anti-jamming strategies. This approximately constitutes the Nash equilibrium of the anti-jamming Markov game.
Read full abstract