The frequency hopping pattern of the existing frequency hopping communication system is not designed according to the electromagnetic interference environment, resulting in blind anti-jamming. Therefore, to address this problem, a “three-variable” frequency-hopping pattern is proposed, where the frequency, hopping rate, and instantaneous bandwidth of the frequency-hopping signal vary randomly based on the background electromagnetic interference. The decision-making problem of the “three-variable” frequency-hopping pattern is modeled as a Markov decision process (MDP) by constructing the state-action-reward tuple. The designed frequency varies continuously within a small frequency band selected from a pseudo-random sequence to alleviate the problem of dimension explosion in decision-making. At the same time, discrete values for the hopping rate and instantaneous bandwidth are designed. To solve this MDP problem efficiently, a combined deep reinforcement learning algorithm (OC-CDRL) based on optimistic exploration and conservative estimation is proposed, which combines the features of TD3 and D3QN algorithms and designs the corresponding states, actions, and rewards to deal with continuous and discrete action spaces, respectively. To address the problem that the D3QN algorithm tends to fall into local optimal solutions, an optimistic exploration strategy (OES) for action selection is proposed to improve the degree of exploration. Moreover, the loss function is improved by conservatively estimating state–action pairs outside the experience replay buffer, reducing the overestimation of the optimistic action-value function and increasing the stability and convergence of the algorithm. Comparative simulation results of the algorithms in different electromagnetic interference environments show that the OC-CDRL algorithm effectively avoids most regions with higher interference and has better adaptability and anti-jamming capability.
Read full abstract