Least-squares policy iteration algorithms for robotics: Online, continuous, and automatic

Stefan R Friedrich,Michael Schreibauer,Martin Buss

doi:10.1016/j.engappai.2019.04.001

Abstract

Reinforcement learning (RL) is a general framework to acquire intelligent behavior by trial-and-error and many successful applications and impressive results have been reported in the field of robotics. In robot control problem settings, it is oftentimes characteristic that the algorithms have to learn online through interaction with the system while it is operating, and that both state and action spaces are continuous. Least-squares policy iteration (LSPI) based approaches are therefore particularly hard to employ in practice, and parameter tuning is a tedious and costly enterprise. In order to mitigate this problem, we derive an automatic online LSPI algorithm that operates over continuous action spaces and does not require an a-priori, hand-tuned value function approximation architecture. To this end, we first show how the kernel least-squares policy iteration algorithm can be modified to handle data online by recursive dictionary and learning update rules. Next, borrowing sparsification methods from kernel adaptive filtering, the continuous action-space approximation in the online least-squares policy iteration algorithm can be efficiently automated as well. We then propose a similarity-based information extrapolation for the recursive temporal difference update in order to perform the dictionary expansion step efficiently in both algorithms. The performance of the proposed algorithms is compared with respect to their batch or hand-tuned counterparts in a simulation study. The novel algorithms require less prior tuning and data is processed completely on the fly, yet the results indicate that similar performance can be obtained as by careful hand-tuning. Therefore, engineers from both robotics and AI can benefit from the proposed algorithms when an LSPI algorithm is faced with online data collection and tuning by experiment is costly.

Highlights

For many robotic tasks detailed mathematical modeling is hard or time-consuming, which makes reinforcement learning (RL) an attractive alternative to model-based control design
We investigate the well-known least-squares policy iteration algorithms kernel-based least-squares policy iteration (KLSPI) and online leastsquares policy iteration (OLSPI) in view of their applicability to intelligent real-time automation, e. g., robotic control problems
The KLSPI algorithm is reformulated for incremental data collection, yielding the proposed Online KLSPI (OKLSPI) for online usage

Summary

Introduction

For many robotic tasks detailed mathematical modeling is hard or time-consuming, which makes reinforcement learning (RL) an attractive alternative to model-based control design. G., by discretization and subsequent symbolic post-processing (Alibekov et al, 2018) or heuristically by expert knowledge and fuzzy representations (Hourfar et al, 2019) Despite their drawbacks, value function based algorithms are preferred in some robotic applications in order to avoid the limitations of policy search, see Kober et al (2013, Tab. 1). Anderlini et al (2017) report unexpected behavior of LSPI in the control of a wave energy converter model, presumably due the radial basis function approximation In robotics, this issue can become even more tedious, when tuning the algorithmic parameters is costly in experimental setups where merely collecting suitable data can be hard, e. To leverage the potential of LSPI in robotics, algorithms are needed that operate online, over continuous state and action spaces, and automatically handle the VFA

Related work

Contributions

Reinforcement learning

Kernel-based policy iteration

Problem statement

Online kernel least-squares policy iteration

Sparsification rule

Online dictionary expansion

Automated online least-squares policy iteration

Similarity-based information extrapolation in TD update

Convergence analysis

Complexity analysis and optimized implementation

Simulation study example

OKLSPI and the car on the hill problem

AOLSPI controlling the inverted pendulum

Additional discussion of the similarity-based extrapolation

Summary and future work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Engineering Applications of Artificial Intelligence	Publication Date: May 31, 2019
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Least-squares policy iteration algorithms for robotics: Online, continuous, and automatic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Similar Papers

Research on Intelligent Merging Decision-making of Unmanned Vehicles Based on Reinforcement Learning
Xue-Mei Chen ... Zhen-Hua Zhang
-
Xue-Mei Chen, et. al.Xue-Mei Chen ... Zhen-Hua Zhang
01 Jun 2018
01 Jun 2018

Learning continuous-action control policies
Jason Pazis ... Michail G Lagoudakis
-
Jason Pazis, et. al.Jason Pazis ... Michail G Lagoudakis
01 Mar 2009
01 Mar 2009

Binary action search for learning continuous-action control policies
Jason Pazis ... Michail G Lagoudakis
-
Jason Pazis, et. al.Jason Pazis ... Michail G Lagoudakis
14 Jun 2009
14 Jun 2009

Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces
Yahao Xu ... Hongbin Deng
Neurocomputing | VOL. 537
Yahao Xu, et. al.Yahao Xu ... Hongbin Deng
31 Mar 2023
Neurocomputing | VOL. 537

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Least-squares policy iteration algorithms for robotics: Online, continuous, and automatic

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence