Safety-Aware Apprenticeship Learning

Weichao Zhou,Wenchao Li

doi:10.1007/978-3-319-96145-3_38

Abstract

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert’s demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

Highlights

The rapid progress of artificial intelligence (AI) comes with a growing concern over its safety when deployed in real-life systems and situations
We consider safety specification expressed in Probabilistic Computation Tree Logic (PCTL) and show how probabilistic model checking can be used to ensure safety and retain performance of a learning algorithm known as apprenticeship learning (AL)
– We develop a novel algorithm called CounterExample Guided Apprenticeship Learning (CEGAL) that combines probabilistic model checking with the optimization-based approach of apprenticeship learning

Summary

Introduction

The rapid progress of artificial intelligence (AI) comes with a growing concern over its safety when deployed in real-life systems and situations. The concept of AL is closely related to reinforcement learning (RL) where an agent learns what actions to take in an environment (known as a policy) by maximizing some notion of long-term reward. In AL, the agent is not given the reward function, but instead has to first estimate it from a set of expert demonstrations via a technique called inverse reinforcement learning [18]. An expert demonstrates the task by maximizing this reward function and the agent tries to derive a policy that can match the feature expectations of the expert’s demonstrations. One issue with LfD is that the expert often can only demonstrate how the task works but not how the task may fail This is because failure may cause irrecoverable damages to the system such as crashing a vehicle. Even if all the demonstrations are safe, the agent may still end up learning an unsafe policy

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Safety-Aware Apprenticeship Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 26	License type: CC BY 4.0

Similar Papers

Apprenticeship learning via inverse reinforcement learning
Pieter Abbeel ... Andrew Y Ng
-
Pieter Abbeel, et. al.Pieter Abbeel ... Andrew Y Ng
01 Jan 2004
01 Jan 2004

Inferring Passengers’ Interactive Choices on Public Transits via MA-AL: Multi-Agent Apprenticeship Learning
Mingzhou Yang ... Jun Luo
-
Mingzhou Yang, et. al.Mingzhou Yang ... Jun Luo
20 Apr 2020
20 Apr 2020

Proposal and Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning
Yu Wang ... Geir E Dullerud
-
Yu Wang, et. al.Yu Wang ... Geir E Dullerud
14 Dec 2020
14 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Safety-Aware Apprenticeship Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers