Abstract

An unmanned swarm system (UWS) is a multiagent system that can fulfill task requirements through autonomous and cooperative behavior strategy learning. However, learning instability is inevitable in a dynamic mission setting, as the agents continuously adapt to an evolving mission objective. This article proposes several knowledge enhancement mechanisms to improve the training efficiency and learning stability of a UWS in a confined-space confrontation mission. Specifically, a punishment for transcending action-space boundary and a reward for satisfying agent space-time distance constraints are introduced as training reward enhancements. Meanwhile, experience sharing among agents is optimized for unanimous behavior. We apply these novel mechanisms to several representative single-agent and multiagent reinforcement learning algorithms and verify their effectiveness on our proprietary, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SwarmFlow</i> , simulation system. Simulations show that the proposed mechanisms improve existing algorithms’ convergence speed and performance stability. The increase is more prominent for multiagent reinforcement learning algorithms than single-agent algorithms where the convergence time is halved, and the mission success rates increase by 3–4%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call