Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Gang Wang

doi:10.1109/tsp.2021.3128723

Abstract

Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild “ergodicity-like” assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing

Lead the way for us

Journal: IEEE Transactions on Signal Processing	Publication Date: Jan 1, 2022
Citations: 3

Similar Papers

On Stochastic Processes Defined by Differential Equations with a Small Parameter
R Z Has’Minskii
Theory of Probability & Its Applications | VOL. 11
R Z Has’MinskiiR Z Has’Minskii
01 Jan 1965
Theory of Probability & Its Applications | VOL. 11

Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
Zaiwei Chen ... Siva Theja Maguluri
Automatica | VOL. 146
Zaiwei Chen, et. al.Zaiwei Chen ... Siva Theja Maguluri
28 Sep 2022
Automatica | VOL. 146

On Stochastic Approximation for Random Processes with Continuous Time
T P Krasulina
Theory of Probability & Its Applications | VOL. 16
T P KrasulinaT P Krasulina
01 Jan 1970
Theory of Probability & Its Applications | VOL. 16

River flow forecasting through nonlinear local approximation in a fuzzy model
P C Nayak ... K P Sudheer
Neural Computing and Applications | VOL. 25
P C Nayak, et. al.P C Nayak ... K P Sudheer
27 Jul 2014
Neural Computing and Applications | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing