Abstract

Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.

Highlights

  • The traveling salesman problem (TSP) is a well-known combinatorial optimization problem

  • We propose a deep reinforcement learning algorithm trained via Policy Gradient to learn improvement heuristics based on 2-opt moves

  • In local search algorithms, the quality of the initial solution usually affects the quality of the final solution, i.e. local search methods can get stuck in local optima [10]

Read more

Summary

Introduction

The traveling salesman problem (TSP) is a well-known combinatorial optimization problem. Classic approaches to solve the TSP can be classified in exact and heuristic methods. The former have been extensively studied using integer linear programming [2] which are guaranteed to find an optimal solution but are often too. Improvement heuristics enhance feasible solutions through a search procedure. A procedure starts at an initial solution S0 and replaces a previous solution St by a better solution St+1. Local search methods such as the effective Lin–Kernighan–Helsgaun (LKH) [11] heuristic perform well for the TSP. The procedure searches for k edge swaps (k-opt moves) that will be replaced by new edges resulting in a shorter tour. These work by accepting worse solutions to allow more exploration of the search

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.