Abstract

Future network services must adapt to the highly dynamic uplink and downlink traffic. To fulfill this requirement, the 3rd Generation Partnership Project (3GPP) proposed dynamic time division duplex (D-TDD) technology in Long Term Evolution (LTE) Release 11. Afterward, the 3GPP RAN#86 meeting clarified that 5G NR needs to support dynamic adjustment of the duplex pattern (transmission direction) in the time domain. Although 5G NR provides a more flexible duplex pattern, how to configure an effective duplex pattern according to services traffic is still an open research area. In this research, we propose a distributed multi-agent deep reinforcement learning (MARL) based decentralized D-TDD configuration method. First, we model a D-TDD configuration problem as a dynamic programming problem. Given the buffer length of all UE, we model the D-TDD configuration policy as a conditional probability distribution. Our goal is to find a D-TDD configuration policy that maximizes the expected discount return of all UE’s sum rates. Second, in order to reduce signaling overhead, we design a fully decentralized solution with distributed MARL technology. Each agent in MARL makes decisions only based on local observations. We regard each base station (BS) as an agent, and each agent configures uplink and downlink time slot ratio according to length of intra-BS user (UE) queue buffer. Third, in order to solve the problem of overall system revenue caused by the lack of global information in MARL, we apply leniency control and binary LSTM (BLSTM) based auto-encoder. Leniency controller effectively controls Q-value estimation process in MARL according to Q-value and current network conditions, and auto-encoder makes up for the defect that leniency control cannot handle complex environments and high-dimensional data. Through the parallel distributed training, the global D-TDD policy is obtained. This method deploys the MARL algorithm on the Mobile Edge Computing (MEC) server of each BS and uses the storage and computing capabilities of the server for distributed training. The simulation results show that the proposed distributed MARL converges stably in various environments, and performs better than distributed deep reinforcement algorithm.

Highlights

  • Mobile data traffic is forecasted to grow significantly because of the rapid change in patterns of application services and massive explosion in use of connected devices

  • We carry out a simulation to validate our proposed lenient-multi-agent deep reinforcement learning (MARL) based dynamic time division duplex (D-TDD) duplex control framework

  • We developed a D-TDD framework for 5G NR that allows each base station (BS) dynamic adjust duplex pattern to adapt services buffer

Read more

Summary

Introduction

Mobile data traffic is forecasted to grow significantly because of the rapid change in patterns of application services and massive explosion in use of connected devices. The behavior of UEs in these scenarios is different This is to say, the volume and pattern of network traffic will change rapidly. In response to this sudden surge, D-TDD is chosen as a possible solution [2,3]. The traditional static TDD (S-TDD) synchronizes the uplink/downlink (UL/DL) slots ratio configuration of all BSs. D-TDD technology dynamic changes the ratio of downlink/uplink slots for traffic adaptation. D-TDD technology dynamic changes the ratio of downlink/uplink slots for traffic adaptation This bring two gains to the system [4]:

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call