Modeling a Continuous Locomotion Behavior of an Intelligent Agent Using Deep Reinforcement Technique

Stephen Dankwa,Wenfeng Zheng

doi:10.1109/ccet48361.2019.8989177

Abstract

In this current research work, we applied a Twin- Delayed DDPG (TD3) algorithm to solve the most challenging virtual Artificial Intelligence application by training a HalfCheetah robot as an Intelligent Agent to run across a field. Twin-Delayed DDPG (TD3) is a recent breakthrough smart AI model of a Deep Reinforcement Learning which combines the state-of-the-art techniques in Artificial Intelligence, including continuous Double Deep Q-Learning, Policy Gradient and Actor-Critic. These Deep Reinforcement Learning approaches have the capabilities to train an Intelligent agent to interact with an environment with automatic feature engineering, that is, requiring minimal domain knowledge. Twin-Delayed Deep Deterministic Policy Gradient algorithm (TD3) was built on the Deep Deterministic Policy Gradient algorithm (DDPG). During the implementation of the TD3 model, we used a two- layer feedforward neural network of 400 and 300 hidden nodes respectively, with Rectified Linear Units (ReLU) as an activation function between each layer for both the Actor and Critics, and then a final tanh unit following the output of the Actor. Overall, we developed six (6) neural networks. The Critic received both the state and action as input to the first layer. Both the network parameters were updated using the Adam optimizer. The implementation of the TD3 algorithm was made possible by using the pybullet continuous control environment which was interfaced through the OpenAI Gym. The idea behind the Twin-Delayed DDPG (TD3) is to reduce overestimation bias in Deep Q-Learning with discrete actions which are ineffective in an Actor-Critic domain setting. After exposing the Agent to training for 500,000 iterations, the Agent then achieved a Maximum Average Reward over the evaluation time-step of approximately 1891. Twin-Delayed Deep Deterministic Policy Gradient (TD3) has prominently improved both the learning speed and performance of the DDPG in a challenging task in a continuous control setting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling a Continuous Locomotion Behavior of an Intelligent Agent Using Deep Reinforcement Technique

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Twin-Delayed DDPG
Stephen Dankwa ... Wenfeng Zheng
-
Stephen Dankwa, et. al.Stephen Dankwa ... Wenfeng Zheng
26 Aug 2019
26 Aug 2019

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
Shuangxia Bai ... Evgeny Neretin
Journal of Artificial Intelligence and Technology | VOL. -
Shuangxia Bai, et. al.Shuangxia Bai ... Evgeny Neretin
07 Dec 2021
Journal of Artificial Intelligence and Technology | VOL. -

A Deep Deterministic Policy Gradient Approach for Vehicle Speed Tracking Control With a Robotic Driver
Gaofeng Hao ... Dan Wang
IEEE Transactions on Automation Science and Engineering | VOL. 19
Gaofeng Hao, et. al.Gaofeng Hao ... Dan Wang
01 Jul 2022
IEEE Transactions on Automation Science and Engineering | VOL. 19

Morphing control of a new bionic morphing UAV with deep reinforcement learning
Dan Xu ... Gang Chen
Aerospace Science and Technology | VOL. 92
Dan Xu, et. al.Dan Xu ... Gang Chen
28 May 2019
Aerospace Science and Technology | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling a Continuous Locomotion Behavior of an Intelligent Agent Using Deep Reinforcement Technique

Abstract

Talk to us

Similar Papers