Abstract

AbstractTravelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call