As a highly efficient and flexible data collection device, Unmanned Aerial Vehicles (UAVs) have gained widespread application because of the continuous proliferation of Internet of Things (IoT). Addressing the high demands for timeliness in practical communication scenarios, this paper investigates multi-UAV collaborative path planning, focusing on the minimization of weighted average Age of Information (AoI) for IoT devices. To address this challenge, the multi-agent twin delayed deep deterministic policy gradient with dual experience pools and particle swarm optimization (DP-MATD3) algorithm is presented. The objective is to train multiple UAVs to autonomously search for optimal paths, minimizing the AoI. Firstly, considering the relatively slow learning speed and susceptibility to local minima of neural network algorithms, an improved particle swarm optimization (PSO) algorithm is utilized for parameter optimization of the multi-agent twin delayed deep deterministic policy gradient (MATD3) neural network. Secondly, with the introduction of the dual experience pools mechanism, the efficiency of network training is significantly improved. Experimental results show DP-MATD3 outperforms MATD3 in average weighted AoI. The weighted average AoI is reduced by 33.3% and 27.5% for UAV flight speeds of v = 5 m/s and v = 10 m/s, respectively.
Read full abstract