Average reward reinforcement learning: Foundations, algorithms, and empirical results

Sridhar Mahadevan

doi:10.1007/bf00114727

Abstract

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric calledn-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms while several algorithms can provably generategain-optimal policies that maximize average reward, none of them can reliably filter these to producebias-optimal (orT-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Average reward reinforcement learning: Foundations, algorithms, and empirical results

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Jan 1, 1996
Citations: 366

Similar Papers

Hierarchical Average Reward Reinforcement Learning
Sridhar Mahadevan ... Mohammad Ghavamzadeh
-
Sridhar Mahadevan, et. al.Sridhar Mahadevan ... Mohammad Ghavamzadeh
25 Jun 2003
25 Jun 2003

Improving selection strategies in zeroth-level classifier systems based on average reward reinforcement learning
Zhaoxiang Zang ... Zhao Li
Journal of Ambient Intelligence and Humanized Computing | VOL. 15
Zhaoxiang Zang, et. al.Zhaoxiang Zang ... Zhao Li
30 Jan 2018
Journal of Ambient Intelligence and Humanized Computing | VOL. 15

Learning classifier system with average reward reinforcement learning
Zhaoxiang Zang ... Dan Xia
Knowledge-Based Systems | VOL. 40
Zhaoxiang Zang, et. al.Zhaoxiang Zang ... Dan Xia
05 Dec 2012
Knowledge-Based Systems | VOL. 40

LC-Learning: Phased Method for Average Reward Reinforcement Learning —Preliminary Results —
Taro Konda ... Tomohiro Yamaguchi
-
Taro Konda, et. al.Taro Konda ... Tomohiro Yamaguchi
01 Jan 2002
01 Jan 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Average reward reinforcement learning: Foundations, algorithms, and empirical results

Abstract

Talk to us

Similar Papers

More From: Machine Learning