Abstract

We design an effective and scalable Deep Reinforcement Learning (DRL) approach for the Routing, Modulation and Spectrum Assignment (RMSA) problem in elastic optical networks. We use Convolutional Neural Networks (CNN) to embed the state and Deep Neural Networks (DNN) to learn the policy. We propose a novel state representation and reward function that interestingly guide the agent on assigning appropriate routes and spectrum by incorporating information on the spectrum utilisation and spectrum fragmentation. This gives the agent information about the consequence or cost of each action across the network, reducing the level of knowledge abstraction required for the agent. To show the effectiveness of the reward function and the importance of well-designed state representations, we have designed two state representations: the first with aggregation of spectrum occupancy information and the second without aggregation. The Proximal Policy Optimization (PPO) algorithm is investigated with an actor critic model where an entropy bonus is added to the loss function to ensure sufficient exploration. The proposed solution is compared with a greedy heuristic and a PPO with standard reward and state representation. Numerical results show that the proposed model provides very good solutions and works well on dataset instances with large topologies (up to 75 nodes). The proposed PPO outperformed the baseline algorithms by obtaining the largest throughput on all test instances. In addition, its spectrum usage has the lowest fragmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call