Reinforcement Learning (RL) has been employed to assign transmission parameters to all sub-carriers in a set frequency band for anti-jamming Orthogonal Frequency Division Multiplexing (OFDM) systems. However, prior works often overlooked the influence of wireless environment fading and convergence issues stemming from overly large parameter sets. To address these challenges, an anti-jamming scheme was proposed based on the Non-Contiguous Orthogonal Frequency Division Multiplexing (NC-OFDM) communication system integrated with reinforcement learning. First, all sub-carriers were divided into sub-bands, and a Finite State Markov Sub-bands (FSMS) model was established to describe the time-varying fading characteristics of each sub-band by combining Adaptive Modulation and Coding (AMC) technology. To mitigate instability due to the fading channel, a joint sub-band and modulation anti-jamming decision scheme was adopted, enabling the transmitter to select the optimal sub-band and transmission rate. Ultimately, this decision-making process was modeled as a Markov Decision Process (MDP), and an Upper Confidence Bound based Q-learning (UCB-Q) anti-jamming algorithm was proposed for obtaining the joint sub-band and transmission rate selection strategies. Simulation results indicate that the proposed algorithm demonstrates enhanced speed and superior average throughput. Additionally, the algorithm showcases the same commendable anti-jamming performance in scenarios with time-varying dynamic jamming.
Read full abstract