In multi-provider 5G/6G networks, service delegation enables administrative domains to federate in provisioning NFV network services. Admission control in selecting the appropriate domain for service deployment, without prior knowledge of service requests’ statistical distributions, is fundamental to maximize average profit. This paper analyzes a general federation contract model for service delegation in various ways. First, under the assumption of known system dynamics, we obtain the theoretically optimal performance bound by formulating the admission control problem as an infinite-horizon Markov decision process (MDP) and solving it through dynamic programming, which is used as a benchmark to evaluate practical solutions. Second, we apply Reinforcement Learning (RL) to practically tackle the problem when the arrival and departure rates are not known. For the first time in this context, we analyze the performance of the widely used Q-Learning algorithm, and prove as it maximizes the discounted rewards, it is not an efficient solution due to its sensitivity to the discount factor. Then, we propose the average reward reinforcement learning approach (named “R-Learning”) to find the policy that directly maximizes the average profit. Finally, we evaluate different solutions through extensive simulations and experimentally using the 5Growth management and orchestration platform. Results confirm that the proposed R-Learning solution always outperforms Q-Learning and the greedy policies. Furthermore, while there is at most a 9% optimality gap in the ideal simulation environment, it competes with the MDP solution in the experimental assessment.
Read full abstract