Network function virtualization (NFV) in elastic optical datacenter interconnections (EO-DCIs) enables flexible and timely deployment of network services. However, as the service provisioning of virtual network function service chains (vNF-SCs) in an EO-DCI needs to orchestrate the allocations of IT resources in datacenters (DCs) and spectrum resources on fiber links dynamically, it is a complex and challenging problem. In this work, we model the problem as a Markov decision process (MDP), and propose a hierarchical deep reinforcement learning (DRL) model based on graph neural network (GNN), namely, HRLOrch, to tackle it. To ensure its universality and scalability, we design the policy neural network (NN) in HRLOrch based on a GNN. As the GNN-based policy NN can operate on the graph-structured network state of an EO-DCI directly, it can adapt to an arbitrary EO-DCI topology without any structural changes. Then, through analysis, we find that the EO-DCI is a sparse reward environment if we want to train a DRL model to minimize the blocking probability of vNF-SCs in it directly. To address this issue, we design a hierarchical DRL with lower-level and upper-level models to improve the convergence performance of training. Specifically, we make the lower-level DRL optimize the provisioning scheme of each vNF-SC to minimize its resource usage, while the upper-level one coordinates the provisioning of all the active vNF-SCs to minimize the overall blocking probability. Hence, the lower-level and upper-level DRL models operate cooperatively in the training to optimize the dynamic provisioning of vNF-SCs. Our simulations demonstrate the universality and scalability of HRLOrch, and confirm that it can outperform the existing algorithms for vNF-SC provisioning in an EO-DCI.
Read full abstract