Abstract

The study of online routing and spectrum allocation (RSA) problem has assumed increasing importance due to the exponential growth of dynamic traffic with uncertainty in elastic optical networks (EONs). This paper first formulates offline RSA as a mixed-integer linear program (MILP) with the consideration of the time attribute, and then models online RSA by a carefully designed Markov decision process (MDP), including the state, action and reward. To deal with such a complex dynamic optimization problem, a novel algorithm based on the classic deep reinforcement learning (DRL) framework, Deep Q-network (DQN), is developed. For further promoting the training efficiency, a dueling network architecture, an improved ɛ-greedy strategy and a series of parameter adjustments are applied. Simulation results demonstrate the effectiveness of the proposed algorithm and show its superiority in reducing the blocking probability compared with the state of the art, which further exhibits the potential of applying DRL to solve such complex real-time decision-making problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call