Abstract

The conventional adaptive modulation (AM) technology is restricted in its effectiveness within the rapidly changing time-space-frequency of the underwater acoustic (UWA) channel. In contrast, reinforcement learning (RL) based AM employs an online training mode that interacts with the channel environment, enabling optimal modulation and coding schemes. By utilizing channel information and considering the bit error ratio (BER) of signals, we propose an RL-based AM strategy that adapts the modulation scheme to enhance spectral efficiency in a non-stationary UWA environment and maximize data throughput for UWA communication. The research is centered on the application of multi-carrier modulation in orthogonal frequency division multiplexing (OFDM). The AM is formulated as a Markov decision process (MDP) and resolved through the implementation of the Proximal Policy Optimization (PPO) algorithm. Simulation experiments indicate that the PPO algorithm exhibits faster and more efficient convergence while maintaining a stable convergence effect, in comparison to the Deep Q-Network (DQN) algorithm, Double DQN (DDQN) algorithm, and Q-Learning (QL) algorithm when addressing the UWA modulation problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call