Abstract

In this paper, the convergence time of federated reinforcement learning (FRL) that is deployed over a realistic wireless network is studied. In the considered model, several devices and the base station (BS) jointly participate in the iterative training of an FRL algorithm. Due to limited wireless resources, the BS must select a subset of devices to exchange FRL training parameters at each iteration, which will significantly affect the training loss and convergence time of the considered FRL algorithm. This joint learning, wireless resource allocation, and device selection problem is formulated as an optimization problem aiming to minimize the FRL convergence time while meeting the FRL temporal difference (TD) error requirement. To solve this problem, a deep Q network based algorithm is designed. The proposed method enables the BS to dynamically select an appropriate subset of devices to join the FRL training. Given the selected devices, a resource block allocation scheme can be derived to further minimize the FRL convergence time. Simulation results with real data show that the proposed approach can reduce the FRL convergence time by up to 44.7% compared to a baseline that randomly determines the subset of participating devices and their occupied resource blocks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call