Abstract

Reinforcement learning (RL) is a general framework to acquire intelligent behavior by trial-and-error and many successful applications and impressive results have been reported in the field of robotics. In robot control problem settings, it is oftentimes characteristic that the algorithms have to learn online through interaction with the system while it is operating, and that both state and action spaces are continuous. Least-squares policy iteration (LSPI) based approaches are therefore particularly hard to employ in practice, and parameter tuning is a tedious and costly enterprise. In order to mitigate this problem, we derive an automatic online LSPI algorithm that operates over continuous action spaces and does not require an a-priori, hand-tuned value function approximation architecture. To this end, we first show how the kernel least-squares policy iteration algorithm can be modified to handle data online by recursive dictionary and learning update rules. Next, borrowing sparsification methods from kernel adaptive filtering, the continuous action-space approximation in the online least-squares policy iteration algorithm can be efficiently automated as well. We then propose a similarity-based information extrapolation for the recursive temporal difference update in order to perform the dictionary expansion step efficiently in both algorithms. The performance of the proposed algorithms is compared with respect to their batch or hand-tuned counterparts in a simulation study. The novel algorithms require less prior tuning and data is processed completely on the fly, yet the results indicate that similar performance can be obtained as by careful hand-tuning. Therefore, engineers from both robotics and AI can benefit from the proposed algorithms when an LSPI algorithm is faced with online data collection and tuning by experiment is costly.

Highlights

  • For many robotic tasks detailed mathematical modeling is hard or time-consuming, which makes reinforcement learning (RL) an attractive alternative to model-based control design

  • We investigate the well-known least-squares policy iteration algorithms kernel-based least-squares policy iteration (KLSPI) and online leastsquares policy iteration (OLSPI) in view of their applicability to intelligent real-time automation, e. g., robotic control problems

  • The KLSPI algorithm is reformulated for incremental data collection, yielding the proposed Online KLSPI (OKLSPI) for online usage

Read more

Summary

Introduction

For many robotic tasks detailed mathematical modeling is hard or time-consuming, which makes reinforcement learning (RL) an attractive alternative to model-based control design. G., by discretization and subsequent symbolic post-processing (Alibekov et al, 2018) or heuristically by expert knowledge and fuzzy representations (Hourfar et al, 2019) Despite their drawbacks, value function based algorithms are preferred in some robotic applications in order to avoid the limitations of policy search, see Kober et al (2013, Tab. 1). Anderlini et al (2017) report unexpected behavior of LSPI in the control of a wave energy converter model, presumably due the radial basis function approximation In robotics, this issue can become even more tedious, when tuning the algorithmic parameters is costly in experimental setups where merely collecting suitable data can be hard, e. To leverage the potential of LSPI in robotics, algorithms are needed that operate online, over continuous state and action spaces, and automatically handle the VFA

Related work
Contributions
Reinforcement learning
Kernel-based policy iteration
Problem statement
Online kernel least-squares policy iteration
Sparsification rule
Online dictionary expansion
Automated online least-squares policy iteration
Similarity-based information extrapolation in TD update
Convergence analysis
Complexity analysis and optimized implementation
Simulation study example
OKLSPI and the car on the hill problem
AOLSPI controlling the inverted pendulum
Additional discussion of the similarity-based extrapolation
Summary and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.