Abstract

Due to the exponential growth in the use of Wi-Fi networks, it is necessary to study its usage pattern in dense environments for which the legacy IEEE 802.11 MAC (Medium Access Control) protocol was not specially designed. Although 802.11ax aims to improve Wi-Fi performance in dense scenarios due to modifications in the physical layer (PHY), however, MAC layer operations remain unchanged, and are not capable enough to provide stable performance in dense scenarios. Potential applications of Deep Learning (DL) to Media Access Control (MAC) layer of WLAN has now been recognized due to their unique features. Deep Reinforcement Learning (DRL) is a technique focused on behavioral sensitivity and control philosophy. In this paper, we have proposed an algorithm for setting optimal contention window (CW) under different network conditions called DRL-based Contention Window Optimization (DCWO). The proposed algorithm operates in three steps. In the initial step, Wi-Fi is being controlled by the 802.11 standards. In the second step, the agent makes the decisions concerning the value of CW after the TRAIN procedure for the proposed algorithm. The final phase begins after the training, defined by a time duration specified by the user. Now, the agent is fully trained, and no updates will be no longer received. Now the CW is updated via the OPTIMIZE process of DCWO. We have selected total network throughput, instantaneous network throughput, fairness index, and cumulative reward, and compared our proposed scheme DCWO with the Centralized Contention window Optimization with DRL (CCOD). Simulation results show that DCWO with Double Deep Q-Networks (DDQN) performs better than CCOD with (i) Deep Deterministic Policy Gradient (DDPG) and (ii) Deep Q-Network (DQN). More specifically, DCWO with DDQN gives on average 28% and 23% higher network throughput than CCOD in static and dynamic scenarios. Whereas in terms of instantaneous network throughput DCWO gives around 10% better results than the CCOD. DCWO achieves almost near to optimal fairness in static scenarios and better than DQN and DDPG with CCOD in dynamic scenarios. Similarly, while the cumulative reward achieved by DCWO is almost the same with CCOD with DDPG, the uptrend of DCWO is still encouraging.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call