An information-theoretic analysis of return maximization in reinforcement learning

Kazunori Iwata

doi:10.1016/j.neunet.2011.05.002

Abstract

We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An information-theoretic analysis of return maximization in reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society

Lead the way for us

Journal: Neural networks : the official journal of the International Neural Network Society	Publication Date: May 17, 2011
Citations: 23

Similar Papers

An Information-Spectrum Approach to Analysis of Return Maximization in Reinforcement Learning
Kazunori Iwata
-
Kazunori IwataKazunori Iwata
01 Jan 2009
01 Jan 2009

An Information-Theoretic Class of Stochastic Decision Processes
Kazunori Iwata
-
Kazunori IwataKazunori Iwata
01 Dec 2008
01 Dec 2008

The multi-stage dynamic stochastic decision process with unknown distribution of the random utilities
Roberto Tadei ... Daniele Manerba
Optimization Letters | VOL. 14
Roberto Tadei, et. al.Roberto Tadei ... Daniele Manerba
27 Feb 2019
Optimization Letters | VOL. 14

Chapter 8 Markov decision processes
Martin L Puterman
Handbooks in Operations Research and Management Science | VOL. 2
Martin L PutermanMartin L Puterman
01 Jan 1990
Handbooks in Operations Research and Management Science | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An information-theoretic analysis of return maximization in reinforcement learning

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society