Refined Continuous Control of DDPG Actors via Parametrised Activation

Mohammed Hossny,Julie Iskander,Mohamed Attia,Ahmed Abobakr,Khaled Saleh

doi:10.3390/ai2040029

Mohammed Hossny, Julie Iskander + Show 3 more

Open Access

https://doi.org/10.3390/ai2040029

Copy DOI

Journal: AI	Publication Date: Sep 29, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: UNSW Sydney, Alexandria University, Cairo University

Abstract

Continuous action spaces impose a serious challenge for reinforcement learning agents. While several off-policy reinforcement learning algorithms provide a universal solution to continuous control problems, the real challenge lies in the fact that different actuators feature different response functions due to wear and tear (in mechanical systems) and fatigue (in biomechanical systems). In this paper, we propose enhancing the actor-critic reinforcement learning agents by parameterising the final layer in the actor network. This layer produces the actions to accommodate the behaviour discrepancy of different actuators under different load conditions during interaction with the environment. To achieve this, the actor is trained to learn the tuning parameter controlling the activation layer (e.g., Tanh and Sigmoid). The learned parameters are then used to create tailored activation functions for each actuator. We ran experiments on three OpenAI Gym environments, i.e., Pendulum-v0, LunarLanderContinuous-v2, and BipedalWalker-v2. Results showed an average of 23.15% and 33.80% increase in total episode reward of the LunarLanderContinuous-v2 and BipedalWalker-v2 environments, respectively. There was no apparent improvement in Pendulum-v0 environment but the proposed method produces a more stable actuation signal compared to the state-of-the-art method. The proposed method allows the reinforcement learning actor to produce more robust actions that accommodate the discrepancy in the actuators’ response functions. This is particularly useful for real life scenarios where actuators exhibit different response functions depending on the load and the interaction with the environment. This also simplifies the transfer learning problem by fine-tuning the parameterised activation layers instead of retraining the entire policy every time an actuator is replaced. Finally, the proposed method would allow better accommodation to biological actuators (e.g., muscles) in biomechanical systems.

Highlights

While Deep reinforcement learning (DRL) is proven to handle discrete problems effectively and efficiently, continuous control remains a challenging task. This is because it relies on physical systems that are prone to noise due to wear and tear, overheating, and altered actuator response function depending on the load each actuators bears; this is more apparent in robotic and biomechanical control problems
We present a modular perspective of the actor in actor-critic DRL agents and propose modifying the actuation layer to learn the parameters defining the activation functions (e.g., Tanh and Sigmoid)
We propose parameterised activation functions to improve the performance of the deep deterministic policy gradient (DDPG) to accommodate the complex nature of real-life scenarios

Summary

Introduction

While DRL is proven to handle discrete problems effectively and efficiently, continuous control remains a challenging task. This is because it relies on physical systems that are prone to noise due to wear and tear, overheating, and altered actuator response function depending on the load each actuators bears; this is more apparent in robotic and biomechanical control problems. The observations, actions, estimated reward, and next-state observation are stored as an experience in a circular buffer This buffer serves as a pool of experiences, from where samples are drawn to train the actor and the critic neural networks to produce the correct action and estimate the correct reward, respectively

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Refined Continuous Control of DDPG Actors via Parametrised Activation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: AI

Lead the way for us

Similar Papers

Double-net DDPG with the Optimal Action Selection Mechanism
Dazi Li ... Caibo Dong
-
Dazi Li, et. al.Dazi Li ... Caibo Dong
03 Aug 2022
03 Aug 2022

Reinforcement Learning Based Pricing for Demand Response
Amir Ghasemkhani ... Lei Yang
-
Amir Ghasemkhani, et. al.Amir Ghasemkhani ... Lei Yang
01 May 2018
01 May 2018

Energy management of hybrid electric bus based on deep reinforcement learning in continuous state and action space
Huachun Tan ... Yuankai Wu
Energy Conversion and Management | VOL. 195
Huachun Tan, et. al.Huachun Tan ... Yuankai Wu
18 May 2019
Energy Conversion and Management | VOL. 195

An efficient and robust gradient reinforcement learning: Deep comparative policy
Jiaguo Wang ... Meng Yang
Journal of Intelligent & Fuzzy Systems | VOL. 46
Jiaguo Wang, et. al.Jiaguo Wang ... Meng Yang
14 Feb 2024
Journal of Intelligent & Fuzzy Systems | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Refined Continuous Control of DDPG Actors via Parametrised Activation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: AI