Interpretable Imitation Learning with Symbolic Rewards

Takashi Onishi,Nicolas Bougie,Yoshimasa Tsuruoka

doi:10.1145/3627822

Abstract

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world tasks as they naturally feature sparse rewards. In fact, this from-scratch approach is often impractical in environments where extreme negative outcomes are possible. Recent advances in imitation learning have improved sample efficiency by leveraging expert demonstrations. Most work along this line of research employs neural network-based approaches to recover an expert cost function. However, the complexity and lack of transparency make neural networks difficult to trust and deploy in the real world. In contrast, we present a method for extracting interpretable symbolic reward functions from expert data, which offers several advantages. First, the learned reward function can be parsed by a human to understand, verify and predict the behavior of the agent. Second, the reward function can be improved and modified by an expert. Finally, the structure of the reward function can be leveraged to extract explanations that encode richer domain knowledge than standard scalar rewards. To this end, we use an autoregressive recurrent neural network that generates hierarchical symbolic rewards represented by simple symbolic trees. The recurrent neural network is trained via risk-seeking policy gradients. We test our method in MuJoCo environments as well as a chemical plant simulator. We show that the discovered rewards can significantly accelerate the training process and achieve similar or better performance than neural network-based algorithms.

Full Text