In recent years, reinforcement learning (RL) techniques have achieved great success in many different applications. However, their heavy reliance on complex deep neural networks makes most RL models uninterpretable, limiting their application in domains where trust and security are important. To address this challenge, we propose MENS-DT-RL, an algorithm capable of constructing interpretable models for RL via the evolution of decision tree (DT) models. MENS-DT-RL uses a multi-method ensemble algorithm to evolve univariate DTs, guiding the process with a fitness metric that prioritizes interpretability and consistent high performance. Three different initializations for the MENS-DT-RL are proposed, including the use of Imitation Learning (IL) techniques, and a novel pruning approach that reduces solution size without compromising performance. To evaluate the proposed approach, we compare it with other models from the literature on three benchmark tasks from the OpenAI Gym library, as well as on a fertilization problem inspired by real-world crop management. To the best of our knowledge, the proposed scheme is the first to solve the Lunar Lander benchmark with both interpretability and a high confidence rate (90% of episodes are successful), as well as the first to solve the Mountain Car environment with a tree of only 7 nodes. On the real-world task, the proposed MENS-DT-RL is able to produce solutions with the same quality as deep RL policies, with the added bonus of interpretability. We also analyze the best solutions found by the algorithm and show that they are not only interpretable but also diverse in their behavior, empowering the end user with the choice of which model to apply. Overall, the findings show that the proposed approach is capable of producing high-quality transparent models for RL, achieving interpretability without losing performance.
Read full abstract