This article proposes utilizing a single deep reinforcement learning model to solve combinatorial multiobjective optimization problems. We use the well-known multiobjective traveling salesman problem (MOTSP) as an example. Our proposed method employs an encoder-decoder framework to learn the mapping from the MOTSP instance to its Pareto-optimal set. Specifically, it leverages a novel routing encoder to extract information for both the entire multiobjective aspect and every individual objective from the MOTSP instance. The global embeddings and each objective's embeddings are adaptively aggregated via a routing network to form the subproblems' embedding that can well represent the MOTSP features. Using a modified context embedding, the subproblems' embeddings are fed into a decoder to produce a set of approximate Pareto-optimal solutions in parallel. Additionally, we develop a Top-k baseline to enable more efficient data utilization and lightweight training for our proposed method. We compare our method with heuristic-based and learning-based ones on various types of MOTSP instances, and the experimental results show that our method can solve MOTSP instances in real-time and outperform the other algorithms, especially on large-scale problem instances.
Read full abstract