Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Pratik Ramprasad,Yuantong Li,Zhuoran Yang,Zhaoran Wang,Will Wei Sun,Guang Cheng

doi:10.1080/01621459.2022.2096620

Abstract

The recent emergence of reinforcement learning (RL) has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for inference in online learning are restricted to settings involving independently sampled observations, while inference methods in RL have so far been limited to the batch setting. The bootstrap is a flexible and efficient approach for statistical inference in online learning algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this article, we study the use of the online bootstrap method for inference in RL policy evaluation. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm across a range of real RL environments. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Journal of the American Statistical Association

Lead the way for us

Journal: Journal of the American Statistical Association	Publication Date: Jul 20, 2022
Citations: 4

Similar Papers

Value function uncertainty as a cognitive map for reinforcement learning
Daw Nathaniel
Frontiers in Systems Neuroscience | VOL. 3
Daw NathanielDaw Nathaniel
01 Jan 2009
Frontiers in Systems Neuroscience | VOL. 3

Online learning algorithms : For passivity-based and distributed control

-

03 May 2016
03 May 2016

Reinforcement Learning in System Identification
Mariela Cerrada ... Jose Aguilar
-
Mariela Cerrada, et. al.Mariela Cerrada ... Jose Aguilar
01 Jan 2008
01 Jan 2008

Reinforcement Learning Using Kohonen Feature Map Probabilistic Associative Memory Based on Weights Distribution
Yuko Osana
-
Yuko OsanaYuko Osana
14 Jan 2011
14 Jan 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Journal of the American Statistical Association