Abstract

Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well-suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rates or step sizes). Typically, these parameters are chosen based on an extensive parameter search—an approach that neither scales well nor is well-suited for tasks that require changing step sizes due to non-stationarity. To begin to address this challenge, we examine the use of online step-size adaptation using the Modular Prosthetic Limb: a sensor-rich robotic arm intended for use by persons with amputations. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. As a first contribution, we show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream, but TD only achieves comparable performance with a carefully hand-tuned learning rate, while TIDBD uses a robust meta-parameter and tunes its own learning rates. Secondly, our results show that for this particular application TIDBD allows the system to automatically detect patterns characteristic of sensor failures common to a number of robotic applications. As a third contribution, we investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

Highlights

  • Reviewed by: Eiji Uchibe, Advanced Telecommunications Research Institute International (ATR), Japan Martin Lauer, Karlsruhe Institute of Technology (KIT), Germany

  • All four experiments utilize the data from alternating patterns of rest and movement. These four experiments result in three contributions: First, we demonstrate Temporal-Difference Incremental Delta-Bar-Delta (TIDBD) to be a practical alternative to an extensive step size parameter search

  • The results show that TIDBD and classic TD performed comparably in terms of the root mean squared error (RMSE)

Read more

Summary

Introduction

We investigate the sensitivity of classic TD and TIDBD with respect to the initial step-size values on our robotic data set, reaffirming the robustness of TIDBD as shown in previous papers Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots. As an agent’s actions have an effect on the environment, these forecasts about what will happen are made with consideration to a policy of agent behavior (nexting, as described by Modayil et al, 2014) In this way, these predictions can capture forward-looking aspects of the environment, such as “If I continue moving my arm to the right, how much load do I expect my elbow servo to experience?” For a concrete example of predictions being used to support robot control, we consider the idea of Pavlovian control, as defined by Modayil and Sutton (2014), wherein learned predictions about what will happen are mapped in predefined or fixed ways to changes in a system’s control behaviors. Without using predictions to alter actions, a collision would need to occur before the robot would be able to take action in response to it

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.