Exploration in Least-Squares Policy Iteration

Lihong Li ,Michael L Littman ,Christopher R Mansley

doi:10.7282/t3xs5zs6

Exploration in Least-Squares Policy Iteration

Lihong Li , Michael L Littman + Show 1 more

https://doi.org/10.7282/t3xs5zs6

Copy DOI

Publication Date: Oct 1, 2008

Citations: 9

Affiliation: Rutgers, The State University of New Jersey

#Least-squares Policy Iteration #Markov Decision Processes + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning algorithm, least-squares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its efiectiveness and superiority over LSPI with two other popular exploration rules in several benchmark problems.

Full Text