Deep Reinforcement Learning Based MAC Protocol for Underwater Acoustic Networks

Xiaowen Ye,Liqun Fu,Yiding Yu

doi:10.1109/tmc.2020.3029844

Abstract

Long propagation delay that causes throughput degradation of underwater acoustic networks (UWANs) is a critical issue in the medium access control (MAC) protocol design in UWANs. This paper develops a deep reinforcement learning (DRL) based MAC protocol for UWANs, referred to as delayed-reward deep-reinforcement learning multiple access (DR-DLMA), to maximize the network throughput by judiciously utilizing the available time slots resulted from propagation delays or not used by other nodes. In the DR-DLMA design, we first put forth a new DRL algorithm, termed as <i>delayed-reward deep Q-network (DR-DQN)</i>. Then we formulate the multiple access problem in UWANs as a reinforcement learning (RL) problem by defining state, action, and reward in the parlance of RL, and thereby realizing the DR-DLMA protocol. In traditional DRL algorithms, e.g., the original DQN algorithm, the agent can get access to the “reward” from the environment immediately after taking an action. In contrast, in our design, the “reward” (i.e., the ACK packet) is only available after twice the one-way propagation delay after the agent takes an action (i.e., to transmit a data packet). The essence of DR-DQN is to incorporate the propagation delay into the DRL framework and modify the DRL algorithm accordingly. In addition, in order to reduce the cost of online training deep neural network (DNN), we provide a nimble training mechanism for DR-DQN. The optimal network throughputs in various cases are given as a benchmark. Simulation results show that our DR-DLMA protocol with nimble training mechanism can: (i) find the optimal transmission strategy when coexisting with other protocols in a heterogeneous environment; (ii) outperform state-of-the-art MAC protocols (e.g., slotted FAMA and DOTS) in a homogeneous environment; and (iii) greatly reduce energy consumption and run-time compared with DR-DLMA with traditional DNN training mechanism.

Full Text