Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

Johannes Günther,Nadia M Ady,Patrick M Pilarski,Michael R Dawson,Alex Kearney

doi:10.3389/frobt.2020.00034

Johannes Günther, Nadia M Ady + Show 3 more

Open Access

https://doi.org/10.3389/frobt.2020.00034

Copy DOI

Journal: Frontiers in robotics and AI	Publication Date: Mar 13, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: University of Alberta

Abstract

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rates or step sizes). Typically, these parameters are chosen based on an extensive parameter search—an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

Highlights

Reviewed by: Eiji Uchibe, Advanced Telecommunications Research Institute International (ATR), Japan Martin Lauer, Karlsruhe Institute of Technology (KIT), Germany
All four experiments utilize the data from alternating patterns of rest and movement. These four experiments result in three contributions: First, we demonstrate Temporal-Difference Incremental Delta-Bar-Delta (TIDBD) to be a practical alternative to an extensive step size parameter search
The results show that TIDBD and classic TD performed comparably in terms of the root mean squared error (RMSE)

Summary

Introduction

We investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots. As an agent’s actions have an effect on the environment, these forecasts about what will happen are made with consideration to a policy of agent behavior (nexting, as described by Modayil et al, 2014) In this way, these predictions can capture forward-looking aspects of the environment, such as “If I continue moving my arm to the right, how much load do I expect my elbow servo to experience?” For a concrete example of predictions being used to support robot control, we consider the idea of Pavlovian control, as defined by Modayil and Sutton (2014), wherein learned predictions about what will happen are mapped in predefined or fixed ways to changes in a system’s control behaviors. Without using predictions to alter actions, a collision would need to occur before the robot would be able to take action in response to it

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in robotics and AI

Lead the way for us

Similar Papers

TDprop: Does Adaptive Optimization With Jacobi Preconditioning Help Temporal Difference Learning?
...
-
, et. al. ...
11 Apr 2021
11 Apr 2021

Linear Least-Squares Algorithms for Temporal Difference Learning
Steven J Bradtke ... Andrew G Barto
-
Steven J Bradtke, et. al.Steven J Bradtke ... Andrew G Barto
21 Aug 2007
21 Aug 2007

Mechanisms Underlying Dopamine-Mediated Reward Bias in Compulsive Behaviors
Valerie Voon ... Mark Hallett
Neuron | VOL. 65
Valerie Voon, et. al.Valerie Voon ... Mark Hallett
01 Jan 2009
Neuron | VOL. 65

Author response: Neural learning rules for generating flexible predictions and computing the successor representation
Ching Fang ... Dmitriy Aronov
-
Ching Fang, et. al.Ching Fang ... Dmitriy Aronov
12 Oct 2022
12 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in robotics and AI