Abstract

Reinforcement learning based on the deep neural network has attracted much attention and has been widely used in real-world applications. However, the black-box property limits its usage from applying in high-stake areas, such as manufacture and healthcare. To deal with this problem, some researchers resort to the interpretable control policy generation algorithm. The basic idea is to use an interpretable model, such as tree-based genetic programming, to extract policy from other black box modes, such as neural networks. Following this idea, in this paper, we try yet another form of the genetic programming technique, evolutionary feature synthesis, to extract control policy from the neural network. We also propose an evolutionary method to optimize the operator set of the control policy for each specific problem automatically. Moreover, a policy simplification strategy is also introduced. We conduct experiments on four reinforcement learning environments. The experiment results reveal that evolutionary feature synthesis can achieve better performance than tree-based genetic programming to extract policy from the neural network with comparable interpretability.

Highlights

  • Reinforcement learning [31] has shown its extraordinary performance in computer games [22] and other real-world applications [29]

  • We propose a method to extract policy from the pre-trained deep neural network based on the evolution feature synthesis (EFS) algorithm [3]

  • We find that EFS is superior to genetic programming and surpass traditional interpretable machine learning models

Read more

Summary

Introduction

Reinforcement learning [31] has shown its extraordinary performance in computer games [22] and other real-world applications [29]. In [11], an explainable reinforcement learning policy model is built by using the tree-based genetic programming (GP) [24] algorithm. It is argued in [11] that it is hard for GP to mimic the behavior of the deep neural network. EFS is the first model that exhibits that the genetic programming algorithm can derive policy directly from DNN to achieve comparable performance with stronger interpretability. The rest of this paper is organized as follows: the section introduces the related works of reinforcement learning based on genetic programming algorithms. We conclude this paper with some future works in the last section

Related work
Result of policy simplification
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call