Abstract

Sequential decision-making requires balancing multiple conflicting objectives through multi-objective reinforcement learning (MORL). Moreover, decision-makers desire dense solutions that satisfy their requirements and consider the trade-offs between different objectives (Pareto optimal solutions). Most deep reinforcement learning methods focus on single-objective problems or solve multi-objective problems using simple linear combinations, which may oversimplify the underlying problem and lead to suboptimal results. This study proposes a neuroevolutionary diversity policy search approach to address MORL problems. It employs neural networks, each equipped with a buffer for storing recent experiences, representing individuals in a population. The non-dominated sorting method and diversity distance metric are employed in the evolutionary process to select high-quality solutions as teachers. The teachers use gradient-based genetic operators to guide the population to produce high-quality offspring, thereby achieving dense Pareto optimal solutions. Furthermore, we introduce three MORL benchmarks with distinct characteristics: (1) a continuous deep sea treasure with convex and nonconvex Pareto fronts; (2) a multi-objective mountain car with sparse rewards and a discontinuous Pareto front; and (3) a multi-objective HalfCheetah with high-dimensional action-state spaces. The experimental results on the three MORL benchmarks demonstrate the superiority of the proposed algorithm in obtaining dense and high-quality Pareto optimal solutions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call