Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Paulo Da Costa,Uzay Kaymak,Alp Akcay,Jason Rhuggenaath,Yingqian Zhang

doi:10.1007/s42979-021-00779-2

Paulo Da Costa, Uzay Kaymak + Show 3 more

Open Access

https://doi.org/10.1007/s42979-021-00779-2

Copy DOI

Journal: SN Computer Science	Publication Date: Jul 23, 2021
Citations: 38	License type: open-access

Affiliation: Eindhoven University of Technology

Abstract

Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.

Highlights

The traveling salesman problem (TSP) is a well-known combinatorial optimization problem
We propose a deep reinforcement learning algorithm trained via Policy Gradient to learn improvement heuristics based on 2-opt moves
In local search algorithms, the quality of the initial solution usually affects the quality of the final solution, i.e. local search methods can get stuck in local optima [10]

Summary

Introduction

The traveling salesman problem (TSP) is a well-known combinatorial optimization problem. Classic approaches to solve the TSP can be classified in exact and heuristic methods. The former have been extensively studied using integer linear programming [2] which are guaranteed to find an optimal solution but are often too. Improvement heuristics enhance feasible solutions through a search procedure. A procedure starts at an initial solution S0 and replaces a previous solution St by a better solution St+1. Local search methods such as the effective Lin–Kernighan–Helsgaun (LKH) [11] heuristic perform well for the TSP. The procedure searches for k edge swaps (k-opt moves) that will be replaced by new edges resulting in a shorter tour. These work by accepting worse solutions to allow more exploration of the search

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Similar Papers

Learning Improvement Heuristics for Solving Routing Problems.
Yaoxin Wu ... Jie Zhang
IEEE transactions on neural networks and learning systems | VOL. 33
Yaoxin Wu, et. al.Yaoxin Wu ... Jie Zhang
01 Sep 2022
IEEE transactions on neural networks and learning systems | VOL. 33

A Graph Pointer Network-Based Multi-Objective Deep Reinforcement Learning Algorithm for Solving the Traveling Salesman Problem
Jeewaka Perera ... Matej Črepinšek
Mathematics | VOL. 11
Jeewaka Perera, et. al.Jeewaka Perera ... Matej Črepinšek
13 Jan 2023
Mathematics | VOL. 11

Deep Reinforcement Learning for Traveling Salesman Problem with Time Windows and Rejections
Rongkai Zhang ... Anatolii Prokhorchuk
-
Rongkai Zhang, et. al.Rongkai Zhang ... Anatolii Prokhorchuk
01 Jul 2020
01 Jul 2020

Generative inverse reinforcement learning for learning 2-opt heuristics without extrinsic rewards in routing problems
Qi Wang ... Jiawei Zhang
Journal of King Saud University - Computer and Information Sciences | VOL. 35
Qi Wang, et. al.Qi Wang ... Jiawei Zhang
01 Oct 2023
Journal of King Saud University - Computer and Information Sciences | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: SN Computer Science