Abstract
Multi-agent decision-making faces many challenges such as non-stationarity and sparse rewards, while the complexity and randomness of the real environment further complicate policy development. This paper addresses the high-dimensional policy optimization problems of unmanned aerial vehicle (UAV) swarms. By modeling the problem scenario as a Markov decision process, a real-time policy optimization algorithm based on evolution strategy (ES) pre-training is proposed. This approach combines decision-time planning with background planning to evaluate and integrate different sets of policy parameters in a temporal context. In the experimental phase, the policy network is trained using both ES and REINFORCE algorithms on a constructed simulation platform. Comparative experiments demonstrate the effectiveness of using ES for policy pre-training. Finally, the proposed real-time policy optimization algorithm further improves the performance of the swarm by approximately 10% in simulations, offering a feasible solution for adversarial games between swarms and extending the research scope of evolutionary algorithms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.