Probably Approximately Correct (PAC) exploration in reinforcement learning

Alexander L Strehl

doi:10.7282/t3z3202g

Abstract

Reinforcement Learning (RL) Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We first formulate and discuss a definition of “efficient” algorithms that is termed Probably Approximately Correct (PAC) in RL. Next we provide general sufficient conditions for such an algorithm that applies to several different modeling assumptions. The conditions can be used to demonstrate that efficient learning is possible in finite MDPs, with either a model-based or model-free approach, in factored MDPs, and in continuous MDPs with linear dynamics. In the reinforcement-learning (RL) problem (Sutton & Barto 1998), an agent acts in an unknown or incompletely known environment with the goal of maximizing an external reward signal. In the most standard mathematical formulation of the problem, the environment is modeled as a Markov Decision Process (MDP) and the goal of the agent is to obtain near-optimal discounted return. Over the years, many algorithms have been proposed for this problem, but analyses of their performances have been relatively scarce. In fact, until recently, most theoretical guarantees have been that certain algorithms will discover an optimal policy in the limit, after an infinite amount of experience. In contrast, several attempts have been made to study “Probably Approximately Correct” or PAC-MDP algorithms, which exhibit near-optimal behavior in polynomial time and experience. This paper discusses several extensions of those results. We present a theorem that provides sufficient conditions for an algorithm to be PAC-MDP. We examine these conditions and show how they can be applied to prove that efficient learning is possible in three interesting scenerios: finite MDPs (i.e. the “Tabular case”),

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Probably Approximately Correct (PAC) exploration in reinforcement learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees

-

24 Feb 2020
24 Feb 2020

A unifying framework for computational reinforcement learning theory
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
Abolfazl Lavaei ... Fabio Somenzi
-
Abolfazl Lavaei, et. al.Abolfazl Lavaei ... Fabio Somenzi
01 Apr 2020
01 Apr 2020

Exploration in Least-Squares Policy Iteration
...
-
, et. al. ...
01 Oct 2008
01 Oct 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probably Approximately Correct (PAC) exploration in reinforcement learning

Abstract

Talk to us

Similar Papers