Statistical Inference of the Value Function for Reinforcement Learning in Infinite-Horizon Settings

Chengchun Shi,Sheng Zhang,Rui Song,Wenbin Lu

doi:10.1111/rssb.12465

Abstract

AbstractReinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision-making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper was to construct confidence intervals (CIs) for a policy’s value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient’s health status. A Python implementation of the proposed procedure is available at https://github.com/shengzhang37/SAVE.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the Royal Statistical Society Series B: Statistical Methodology	Publication Date: Dec 22, 2021
Citations: 13	License type: mit

R Discovery Prime

R Discovery Prime

Statistical Inference of the Value Function for Reinforcement Learning in Infinite-Horizon Settings

Abstract

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology

Lead the way for us

Similar Papers

Reinforcement learning with Gaussian processes for condition-based maintenance
Shenglin Peng ... Qianmei (May) Feng
Computers & Industrial Engineering | VOL. 158
Shenglin Peng, et. al.Shenglin Peng ... Qianmei (May) Feng
16 Apr 2021
Computers & Industrial Engineering | VOL. 158

E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments
Vedang Naik ... Saurabh Singh
International Journal of Engineering and Advanced Technology | VOL. 11
Vedang Naik, et. al.Vedang Naik ... Saurabh Singh
30 Dec 2021
International Journal of Engineering and Advanced Technology | VOL. 11

Successive Over-Relaxation ${Q}$ -Learning
Chandramouli Kamanchi ... Shalabh Bhatnagar
IEEE Control Systems Letters | VOL. 4
Chandramouli Kamanchi, et. al.Chandramouli Kamanchi ... Shalabh Bhatnagar
01 Jan 2020
IEEE Control Systems Letters | VOL. 4

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation
Yang Gao ... Mohsen Mesgar
-
Yang Gao, et. al.Yang Gao ... Mohsen Mesgar
01 Aug 2019
01 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Inference of the Value Function for Reinforcement Learning in Infinite-Horizon Settings

Abstract

Talk to us

Similar Papers

More From: Journal of the Royal Statistical Society Series B: Statistical Methodology