In recent years, Unmanned Aerial Vehicles (UAVs) have been widely used in wireless communications due to their low cost, small size, flexible deployment and mobile controllability. However, because of the line-of-sight (LoS) communication links, the security threat is always a challenging problem to deal with. In particular, information stolen and leakage may happen in the presence of eavesdroppers. This paper proposes a UAV-enabled system with a relay UAV and a jammer UAV, and certain mobile source and destination nodes in the presence of an eavesdropper to solve the secrecy rate maximization problem. In this system, the relay UAV transmits information between pairs of moving source nodes and moving destination nodes with interrupted communication channels due to blockage or long distance, and the jammer UAV interferes with eavesdropper to reduce the milked information through sending jamming signals. We establish an average secrecy rate maximization problem with trajectory and transmit power optimization under certain constraints for this system. Since this problem is non-convex and reformulated as the Markov decision process (MDP), we use deep reinforcement learning (DRL) method to solve it. In this article, we adopt a proximal policy optimization (PPO) algorithm to find an optimal solution because it can deal with the model of continuous action space. According to our defined states, rewards and actions in this specified MDP, this algorithm can autonomously learn to optimize the trajectory and power allocation of the UAVs to realize our goal. Simulation results demonstrate that the proposed PPO-based average secrecy rate maximization algorithm is valid, effective and scalable.
Read full abstract