Abstract

Reconfigurable intelligent surfaces (RISs) can potentially combat jamming. It is non-trivial to perform holistic selections of users, data streams, and modulation-coding modes for all subchannels, and RIS configuration in a downlink multiuser OFDMA system under jamming attacks, because of a mixed-integer program nature and difficulties in acquiring the channel state information (CSI) of the channels to and from the RIS and from an uncooperative jammer. We propose a new deep reinforcement learning (DRL)-based approach that learns through changes in the data rates of the users to reject jamming and maximize the sum rate. The key idea is to decouple the continuous RIS configuration from the discrete selections of users, data streams, subchannels, and modulation-coding modes. Another critical aspect is that we show the optimal selections almost surely follow a winner-takes-all strategy. Accordingly, the new DRL framework learns the RIS configuration with a twin-delayed deep deterministic policy gradient and takes the winner-takes-all strategy to evaluate the reward, thereby reducing the action space and accelerating learning. Simulations show the framework converges fast and fulfills the benefit of the RIS. With no need for the CSI of the channels to and from the RIS and from the jammer, the framework offers practical value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call