Abstract

Deep Reinforcement Learning (DRL) is a powerful tool for optimizing communications in reconfigurable intelligent surface (RIS)-assisted millimeter-wave unmanned aerial vehicle (UAV) systems, particularly for ensuring physical layer security by dynamically maximizing signal strength for legitimate users while minimizing leakage towards eavesdroppers. However, existing approaches often overfit the DRL agent to a non-stochastic environment, where the trajectories of users and eavesdroppers are fixed. This results in a single-task DRL agent that cannot generalize beyond the training environment/task. In response, this study introduces a novel multi-task DRL framework to address this limitation. Firstly, recognizing that not all tasks share the same solution, we proposed a reward-driven clustering method to group similar tasks based on their reward scores. This enables the training of a shared multi-task DRL agent for each cluster. Each cluster can be treated as an expertise, and the cluster-specific multi-task DRL agents are considered the 'experts'. Meanwhile, we leverage federated learning (FL) to speed up the training of the shared multi-task DRL agent. By combining these two techniques, our reward-driven clustered FL autonomously groups similar tasks together rapidly. Simulation results demonstrate a reduction in the number of agents from 50 to 10 specialized 'experts', while achieving a 10X speedup compared to conventional single-task DRL training. Furthermore, the study presents a mixture of experts (MoE) model that unifies these 'experts' into a single model. The MoE is trained to select the most suitable 'expert' when encountering new tasks. Our MoE exhibits robustness by effectively generalizing to 50 novel tasks without fine-tuning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call