Policy Distillation for Real-Time Inference in Fronthaul Congestion Control

Jean P. Martins,Silvia Lins,Ricardo Souza,Igor Almeida

doi:10.1109/access.2021.3129132

Jean P. Martins, Silvia Lins + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3129132

Copy DOI

Abstract

Centralized Radio Access Networks (C-RANs) are improving their cost-efficiency through packetized fronthaul networks. Such a vision requires network congestion control algorithms to deal with sub-millisecond delay budgets while optimizing link utilization and fairness. Classic congestion control algorithms have struggled to optimize these goals simultaneously in such scenarios. Therefore, many Reinforcement Learning (RL) approaches have recently been proposed to deal with such limitations. However, when considering RL policies’ deployment in the real world, many challenges exist. This paper deals with the real-time inference challenge, where a deployed policy has to output actions in microseconds. The experiments here evaluate the tradeoff of inference time and performance regarding a TD3 (Twin-delayed Deep Deterministic Policy Gradient) policy baseline and simpler Decision Tree (DT) policies extracted from TD3 via a process of policy distillation. The results indicate that DTs with a suitable depth can maintain performances similar to those of the TD3 baseline. Additionally, we show that by converting the distilled DTs to rules in C++, we can make inference-time nearly negligible, i.e., sub-microsecond time scale. The proposed method enables the use of state-of-the-art RL techniques to congestion control scenarios with tight inference-time and computational constraints.

Highlights

With the evolution of 5th Generation Mobile Networks (5G), research on more adaptable network architectures is gaining momentum targeting cost reduction and increased user satisfaction
It is becoming clear that we need more future-proof congestion control solutions given that 5G and Beyond 5G (B5G) scenarios include a huge variety of applications with even more stringent requirements. With this and the recent research trends towards the adoption of Machine Learning (ML) in communications, there are already several ML-based congestion control protocols being proposed, and in this work we focus on the challenge of real-time inference, since not all Neural Networks (NNs) architectures can be directly deployed in network nodes due to computational complexity
EXPERIMENTS AND RESULTS This section describes experiments that evaluate how well distilled Decision Tree (DT) compare to a trained TD3 agent, both in terms of performance and inference-time

Summary

Introduction

With the evolution of 5th Generation Mobile Networks (5G), research on more adaptable network architectures is gaining momentum targeting cost reduction and increased user satisfaction. One alternative to meet such requirements while improving the cost-efficiency of transport networks is migrating from dedicated fiber links (with Common Public Radio Interface (CPRI) [3] protocol) towards more flexible, packetized network deployments, which can benefit from statistical. This shared infrastructure scenario brings additional challenges to the transport network deployments since fronthaul links could suffer from network congestion due to, for example, aggressive radio schedulers. The formalism provided by Markov Decision Processes (MDPs) is a useful tool for defining the RL framework. From such a perspective, the specification of an environment for a particular task of interest consists of specifying four elements (S, A, r, Pr). A state-transition probability distribution Pr defining the dynamics of the environment can be specified

Methods

Results

Conclusion