Asymptotics of Reinforcement Learning with Neural Networks

Justin Sirignano,Konstantinos Spiliopoulos

doi:10.1287/stsy.2021.0072

Abstract

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Highlights

Reinforcement learning with neural networks has had a number of recent successes, including learning to play video games (Mnih et al 2013, 2015), mastering the game of Go (Silver et al 2017), and robotics (Kober and Peters 2012)
We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large
We prove that the Q− network converges to the solution of a random ordinary differential equation (ODE)

Summary

Introduction

Reinforcement learning with neural networks (frequently called “deep reinforcement learning”) has had a number of recent successes, including learning to play video games (Mnih et al 2013, 2015), mastering the game of Go (Silver et al 2017), and robotics (Kober and Peters 2012). The presence of a neural network in the Q-learning algorithm introduces technical challenges, which lead us to be able to prove, in the infinite time horizon case, convergence of the limiting ODE to the stationary solution only for small values of the discount factor. The situation is somewhat different in the finite time horizon case, in which we can prove that the limit ODE converges to a global minimum, which is the solution of the associated Bellman equation, for all values of the discount factor in In addition to characterizing the limiting behavior of the neural network as the number of hidden units and stochastic gradient descent steps grow to infinity, we obtain that the neural network in the limit converges to a global minimum with zero training loss (see Section 4).

Q-Learning Algorithm

The Finite Time Horizon Setting

Reinforcement Learning with Neural Network Approximation

Main Results

A Special Case

Proof of Convergence in Infinite Time Horizon Case

Identification of the Limit

Proof of Convergence

Analysis of the Limit Equation

Proof of Convergence in Finite Time Horizon Case

Proof That A Is Positive Definite

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Stochastic Systems	Publication Date: Nov 16, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Asymptotics of Reinforcement Learning with Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Stochastic Systems

Lead the way for us

Similar Papers

Measuring the time needed for training a neural network based on the number of training steps
M Stoica ... G.A Calangiu
-
M Stoica, et. al.M Stoica ... G.A Calangiu
01 Jun 2010
01 Jun 2010

Coronary Artery Bypass Risk Prediction Using Neural Networks
Richard P Lippmann ... David M Shahian
The Annals of Thoracic Surgery | VOL. 63
Richard P Lippmann, et. al.Richard P Lippmann ... David M Shahian
01 Jun 1997
The Annals of Thoracic Surgery | VOL. 63

Deep learning for computational structural optimization
Long C Nguyen ... H Nguyen-Xuan
ISA Transactions | VOL. 103
Long C Nguyen, et. al.Long C Nguyen ... H Nguyen-Xuan
10 Apr 2020
ISA Transactions | VOL. 103

Analytical and Numerical Study of Information Retrieval Method Based on Single-Layer Neural Network with Optimization of Computing Algorithm Performance
Konstantin Kostromitin ... Dar’Ya Nikonova
Mathematics | VOL. 11
Konstantin Kostromitin, et. al.Konstantin Kostromitin ... Dar’Ya Nikonova
23 Aug 2023
Mathematics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Asymptotics of Reinforcement Learning with Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Stochastic Systems