Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems

Qi Wang,Yongsheng Hao,Jiawei Zhang

doi:10.1016/j.jksuci.2023.101787

Abstract

Deep reinforcement learning (DRL) has shown promise in solving challenging combinatorial optimization (CO) problems, such as the traveling salesman problem (TSP) and vehicle routing problem (VRP). However, existing DRL methods rely on manually designed reward functions, which may be inaccurate or unrealistic. Moreover, traditional DRL algorithms suffer from unstable training and sparse reward problems. This paper proposes GIRL (Generative Inverse Reinforcement Learning), a method to learn 2-opt heuristics without explicit extrinsic rewards to address these limitations. GIRL combines generative adversarial networks (GANs) and DRL to learn effective policies and reward functions in a reverse end-to-end fashion, improving generalization capabilities. Furthermore, we introduce a self-attentional policy network tailored for 2-opt heuristics and train the framework using a soft actor-critic algorithm along with a discriminator in the GAN.Extensive experiments on various TSP and VRP instances demonstrate superior performance compared to state-of-the-art methods. Moreover, integrating GANs and DRL enables data-driven reward functions, improving accuracy and realism. Using self-attentional networks and the soft actor-critic algorithm enhances training stability and addresses the sparse reward problem. This work advances reinforcement learning techniques in CO, enabling more accurate and practical optimization methods in real-world applications.

Full Text