ABSTRACT The problem of collaborative task decision-making for unmanned underwater vehicles (UUVs) in an unknown dynamic ocean environment is investigated. In view of the unknown marine environment, a partially observable Markov decision process is designed to achieve partially observable path planning for multi-UUV. Aiming at the problems of long planning time, large amount of data, and insufficient multi-task decision-making capability in the process of multi-UUV underwater collaborative operation, a multi-UUV collaborative task decision-making model based on the multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm for the dynamic environment is constructed. The implementation of the centralised training with distributed execution (CT-DE) training framework enables each UUV to possess autonomous decision-making capability and ensures task safety under weak communication conditions. The pre-training phase shows that the MATD3 algorithm outperforms the multi-agent deep deterministic policy gradient (MADDPG) algorithm for training multi-UUV autonomous decision-making scenarios in dynamic environments. The simulation experiment results verify that the autonomous decision-making method based on the MATD3 algorithm can effectively solve the multi-UUV collaborative task decision-making problem in unknown dynamic environments, and the task process satisfies the requirements of real-time, safety and economy.
Read full abstract