Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning

Jiawei Xia,Yasong Luo,Zhikun Liu,Yalun Zhang,Haoran Shi,Zhong Liu

doi:10.1016/j.dt.2022.09.014

Jiawei Xia, Yasong Luo + Show 4 more

Open Access

https://doi.org/10.1016/j.dt.2022.09.014

Copy DOI

Journal: Defence Technology	Publication Date: Oct 11, 2022
Citations: 16	License type: cc-by-nc-nd

Affiliation: Naval University of Engineering

Abstract

To solve the problem of multi-target hunting by an unmanned surface vehicle (USV) fleet, a hunting algorithm based on multi-agent reinforcement learning is proposed. Firstly, the hunting environment and kinematic model without boundary constraints are built, and the criteria for successful target capture are given. Then, the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process (Dec-POMDP), and a distributed partially observable multi-target hunting Proximal Policy Optimization (DPOMH-PPO) algorithm applicable to USVs is proposed. In addition, an observation model, a reward function and the action space applicable to multi-target hunting tasks are designed. To deal with the dynamic change of observational feature dimension input by partially observable systems, a feature embedding block is proposed. By combining the two feature compression methods of column-wise max pooling (CMP) and column-wise average-pooling (CAP), observational feature encoding is established. Finally, the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy. Each USV in the fleet shares the same policy and perform actions independently. Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs. Moreover, the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance, migration effect in task scenarios and self-organization capability after being damaged, the potential deployment and application of DPOMH-PPO in the real environment is verified.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Defence Technology

Lead the way for us

Similar Papers

Multi-agent reinforcement learning as a rehearsal for decentralized planning
Landon Kraemer ... Bikramjit Banerjee
Neurocomputing | VOL. 190
Landon Kraemer, et. al.Landon Kraemer ... Bikramjit Banerjee
03 Feb 2016
Neurocomputing | VOL. 190

Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
Weichao Mao ... Tamer Başar
IFAC PapersOnLine | VOL. 56
Weichao Mao, et. al.Weichao Mao ... Tamer Başar
01 Jan 2023
IFAC PapersOnLine | VOL. 56

Bayesian-Game-Based Fuzzy Reinforcement Learning Control for Decentralized POMDPs
Rajneesh Sharma ... Matthijs T J Spaan
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Dec 2012
IEEE Transactions on Computational Intelligence and AI in Games | VOL. 4

Fuzzy reinforcement learning control for decentralized partially observable Markov decision processes
Rajneesh Sharma ... Matthijs T J Spaan
-
Rajneesh Sharma, et. al.Rajneesh Sharma ... Matthijs T J Spaan
01 Jun 2011
01 Jun 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Defence Technology