Emerging applications such as remotely-controlled human-to-machine and tactile-haptic applications in the Internet evolution demand stringent low-latency transmission. In realising these applications, current communication networks need to reduce their latency towards a millisecond order. In our previous study, we exploited supervised learning-based machine learning techniques in analysing and optimising bandwidth allocation decisions in access networks to achieve low latency. In this paper, we propose a reinforcement learning-based solution to facilitate adaptive bandwidth allocation in access networks, without needing supervised training and prior knowledge of the underlying networks. In our proposed scheme, the central office estimates the rewards of different bandwidth decisions based on the network latency resulting from executing these decisions. The reward estimates are then used to select decisions that reduce the latency in turn. In particular, we discuss the algorithms that can be used to estimate the rewards and achieve decision selection in the proposed scheme. With extensive simulations, we analyse the performance of these algorithms in diverse network scenarios and validate the effectiveness of the proposed scheme in reducing network latency over existing schemes.
Read full abstract