Abstract

In recent years, significant progress has been made in the multi-target tracking (MTT) of unmanned aerial vehicle (UAV) swarms. Most existing MTT approaches rely on the ideal assumption of a pre-set target trajectory. However, in practice, the trajectory of a moving target cannot be known by the UAV in advance, which poses a great challenge for realizing real-time tracking. Meanwhile, state-of-the-art multi-agent value-based methods have achieved significant progress for cooperative tasks. In contrast, multi-agent actor-critic (MAAC) methods face high variance and credit assignment issues. To address the aforementioned issues, this paper proposes a learning-based factored multi-agent soft actor-critic (FMASAC) scheme under the maximum entropy framework, where the UAV swarm is able to learn cooperative MTT in an unknown environment. This method introduces the idea of value decomposition into the MAAC setting to reduce the variance in policy updates and learn efficient credit assignment. Moreover, to further increase the detection tracking coverage of a UAV swarm, a spatial entropy reward (SER), inspired by the spatial entropy concept, is proposed in this scheme. Experiments demonstrated that the FMASAC can significantly improve the cooperative MTT performance of a UAV swarm, and it outperforms existing baselines in terms of the mean reward and tracking success rates. Additionally, the proposed scheme scales more successfully as the number of UAVs and targets increases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call