Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

Muhammad Burhan Hafez,Stefan Wermter,Cornelius Weber,Matthias Kerzel

doi:10.1515/pjbr-2019-0005

Muhammad Burhan Hafez, Stefan Wermter + Show 2 more

Open Access

https://doi.org/10.1515/pjbr-2019-0005

Copy DOI

Journal: Paladyn, Journal of Behavioral Robotics	Publication Date: Jan 1, 2019
Citations: 17	License type: CC BY 4.0

Affiliation: Universität Hamburg

Abstract

Abstract In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.

Highlights

IntroductionTo improve sample efficiency in deep Reinforcement Learning (RL), different approaches have recently been proposed
An autonomous agent learning control skills from trial and error in an unknown environment with zero prior knowledge is faced with the challenging task of correctlyTo improve sample efficiency in deep Reinforcement Learning (RL), different approaches have recently been proposed
In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input

Summary

Introduction

To improve sample efficiency in deep RL, different approaches have recently been proposed. Schaul et al pointed out that for most deep RL methods, transitions are randomly drawn from a replay buffer of recent transitions whenever a learning update for the network weights is performed. Instead of this inefficient sampling, they proposed a Prioritized Experience Replay, where each transition in the buffer is assigned a sampling probability proportional to its temporal-difference error [2]. An agent learns an estimate of the expectation over the future state representations from a given state and action, called Open Access. This allows for replacing the state-action value function, which estimates the expected future reward, with a function estimating only the immediate reward using the SR, and thereby eliminating the need for the slow propagation of state-action values among visited states

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Paladyn, Journal of Behavioral Robotics

Lead the way for us

Similar Papers

Speeding Up Affordance Learning for Tool Use, Using Proprioceptive and Kinesthetic Inputs
Khuong N Nguyen ... Jaewook Yoo
-
Khuong N Nguyen, et. al.Khuong N Nguyen ... Jaewook Yoo
01 Jul 2019
01 Jul 2019

Deep imitation learning for 3D navigation tasks
Ahmed Hussein ... Mohamed Medhat Gaber
Neural Computing and Applications | VOL. 29
Ahmed Hussein, et. al.Ahmed Hussein ... Mohamed Medhat Gaber
04 Dec 2017
Neural Computing and Applications | VOL. 29

Baseline Laparoscopic Skill May Predict Baseline Robotic Skill and Early Robotic Surgery Learning Curve.
Ruaidhri Mcvey ... Antonio Finelli
Journal of Endourology | VOL. 30
Ruaidhri Mcvey, et. al.Ruaidhri Mcvey ... Antonio Finelli
23 Mar 2016
Journal of Endourology | VOL. 30

VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms
Ting-Kuei Hu ... Fernando Gama
-
Ting-Kuei Hu, et. al.Ting-Kuei Hu ... Fernando Gama
06 Jun 2021
06 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Paladyn, Journal of Behavioral Robotics