Abstract

Reinforcement Learning (RL) systems achieved outstanding performance in different domains such as Atari games, finance, healthcare, and self-driving cars. However, their black-box nature complicates their use, especially in critical applications such as healthcare. To solve this problem, researchers have proposed different approaches to interpret RL models. Some of these methods were adopted from machine learning, while others were designed specifically for RL. The main objective of this paper is to show and explain RL interpretation methods, the metrics used to classify them, and how these metrics were applied to understand the internal details of RL models. We reviewed papers that propose new RL interpretation methods, improve the old ones, or discuss the pros and cons of the existing methods.

Highlights

  • Reinforcement learning algorithms achieved a good performance on multiple domains

  • Reinforcement Learning (RL) interpretability lacks the same variety and depth of survey papers, when compared to Machine Learning (ML) interpretability, despite the fact that some researchers made a good effort in reviewing and categorizing the work related to their proposed approaches such as the work done by Nikulin et al [2] in classification of saliency maps

  • James Murdoch et al [10] proposed a predictive, descriptive, relevant (PDR) framework for ML accuracy evaluation that takes the interpretability part into account. Their method evaluates ML systems based on their predictive accuracy, descriptive accuracy, and relevancy, instead of only using predictive accuracy in the typical ML evaluation pipeline

Read more

Summary

INTRODUCTION

Reinforcement learning algorithms achieved a good performance on multiple domains. our inability to explain and justify their decisions, makes it harder to deploy RL systems in some critical fields such as healthcare, where interpretability is necessary [1]. Researchers proposed various RL interpretation methods and applied them to multiple applications. RL interpretability lacks the same variety and depth of survey papers, when compared to ML interpretability, despite the fact that some researchers made a good effort in reviewing and categorizing the work related to their proposed approaches such as the work done by Nikulin et al [2] in classification of saliency maps. The goal of this paper is to list, classify and compare different methods used to interpret RL. Engineers and researchers using RL means to compare different interpretability approaches to help them choosing the most appropriate method for their RL systems.

RELATED WORK
BACKGROUND
TYPES OF RL SYSTEMS
HIGH LEVEL OVERVIEW
SCOPE OF INTERPRETATION RL interpretation methods can either explain
MODEL RECONCILIATION These methods are based on the following assumptions
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.