Policy Gradient Method Research Articles

Autonomous learning of robotic skills seems to be more natural and more practical than engineered skills, analogous to the learning process of human individuals. Policy gradient methods are a type of reinforcement learning technique which have great potential in solving robot skills learning problems. However, policy gradient methods require too many instances of robot online interaction with the environment in order to learn a good policy, which means lower efficiency of the learning process and a higher likelihood of damage to both the robot and the environment. In this paper, we propose a two-phase (imitation phase and practice phase) framework for efficient learning of robot walking skills, in which we pay more attention to the quality of skill learning and sample efficiency at the same time. The training starts with what we call the first stage or the imitation phase of learning, updating the parameters of the policy network in a supervised learning manner. The training set used in the policy network learning is composed of the experienced trajectories output by the iterative linear Gaussian controller. This paper also refers to these trajectories as near-optimal experiences. In the second stage, or the practice phase, the experiences for policy network learning are collected directly from online interactions, and the policy network parameters are updated with model-free reinforcement learning. The experiences from both stages are stored in the weighted replay buffer, and they are arranged in order according to the experience scoring algorithm proposed in this paper. The proposed framework is tested on a biped robot walking task in a MATLAB simulation environment. The results show that the sample efficiency of the proposed framework is much higher than ordinary policy gradient algorithms. The algorithm proposed in this paper achieved the highest cumulative reward, and the robot learned better walking skills autonomously. In addition, the weighted replay buffer method can be made as a general module for other model-free reinforcement learning algorithms. Our framework provides a new way to combine model-based reinforcement learning with model-free reinforcement learning to efficiently update the policy network parameters in the process of robot skills learning.

Recent learning strategies such as reinforcement learning (RL) have favored the transition from applied artificial intelligence to general artificial intelligence. One of the current challenges of RL in healthcare relates to the development of a controller to teach a musculoskeletal model to perform dynamic movements. Several solutions have been proposed. However, there is still a lack of investigations exploring the muscle control problem from a biomechanical point of view. Moreover, no studies using biological knowledge to develop plausible motor control models for pathophysiological conditions make use of reward reshaping. Consequently, the objective of the present work was to design and evaluate specific bioinspired reward function strategies for human locomotion learning within an RL framework. The deep deterministic policy gradient (DDPG) method for a single-agent RL problem was applied. A 3D musculoskeletal model (8 DoF and 22 muscles) of a healthy adult was used. A virtual interactive environment was developed and simulated using opensim-rl library. Three reward functions were defined for walking, forward, and side falls. The training process was performed with Google Cloud Compute Engine. The obtained outcomes were compared to the NIPS 2017 challenge outcomes, experimental observations, and literature data. Regarding learning to walk, simulated musculoskeletal models were able to walk from 18 to 20.5 m for the best solutions. A compensation strategy of muscle activations was revealed. Soleus, tibia anterior, and vastii muscles are main actors of the simple forward fall. A higher intensity of muscle activations was also noted after the fall. All kinematics and muscle patterns were consistent with experimental observations and literature data. Regarding the side fall, an intensive level of muscle activation on the expected fall side to unbalance the body was noted. The obtained outcomes suggest that computational and human resources as well as biomechanical knowledge are needed together to develop and evaluate an efficient and robust RL solution. As perspectives, current solutions will be extended to a larger parameter space in 3D. Furthermore, a stochastic reinforcement learning model will be investigated in the future in scope with the uncertainties of the musculoskeletal model and associated environment to provide a general artificial intelligence solution for human locomotion learning. Graphical abstract.

Policy Gradient Method Research Articles

Related Topics

Articles published on Policy Gradient Method

Self-play reinforcement learning with comprehensive critic in computer games

Efficient Robot Skills Learning with Weighted Near-Optimal Experiences Policy Optimization

Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors

Two-Stage Volt/Var Control in Active Distribution Networks With Multi-Agent Deep Reinforcement Learning Method

Workflow scheduling based on deep reinforcement learning in the cloud environment

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Policy Gradient Importance Sampling for Bayesian Inference

Trajectory Based Prioritized Double Experience Buffer for Sample-Efficient Policy Optimization

Human locomotion with reinforcement learning using bioinspired reward reshaping strategies.

Deep Reinforcement Learning Based Energy Efficient Multi-UAV Data Collection for IoT Networks

Resilience Microgrid as Power System Integrity Protection Scheme Element With Reinforcement Learning Based Management

Deep Learning-Based Energy Management of an All-Electric City Bus With Wireless Power Transfer

A Hybrid Tracking Control Strategy for Nonholonomic Wheeled Mobile Robot Incorporating Deep Reinforcement Learning Approach

Simultaneous Process Design and Control Optimization using Reinforcement Learning

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra

Метод синтеза нейронных регуляторов для линейных объектов

Reinforcement learning-based hybrid spectrum resource allocation scheme for the high load of URLLC services

Multi-Agent Deep Reinforcement Learning-Based Trajectory Planning for Multi-UAV Assisted Mobile Edge Computing

MobileVisFixer: Tailoring Web Visualizations for Mobile Phones Leveraging an Explainable Reinforcement Learning Framework.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Policy Gradient Method Research Articles

Related Topics

Articles published on Policy Gradient Method

Self-play reinforcement learning with comprehensive critic in computer games

Efficient Robot Skills Learning with Weighted Near-Optimal Experiences Policy Optimization

Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors

Two-Stage Volt/Var Control in Active Distribution Networks With Multi-Agent Deep Reinforcement Learning Method

Workflow scheduling based on deep reinforcement learning in the cloud environment

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Policy Gradient Importance Sampling for Bayesian Inference

Trajectory Based Prioritized Double Experience Buffer for Sample-Efficient Policy Optimization

Human locomotion with reinforcement learning using bioinspired reward reshaping strategies.

Deep Reinforcement Learning Based Energy Efficient Multi-UAV Data Collection for IoT Networks

Resilience Microgrid as Power System Integrity Protection Scheme Element With Reinforcement Learning Based Management

Deep Learning-Based Energy Management of an All-Electric City Bus With Wireless Power Transfer

A Hybrid Tracking Control Strategy for Nonholonomic Wheeled Mobile Robot Incorporating Deep Reinforcement Learning Approach

Simultaneous Process Design and Control Optimization using Reinforcement Learning

Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon

Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra

Метод синтеза нейронных регуляторов для линейных объектов

Reinforcement learning-based hybrid spectrum resource allocation scheme for the high load of URLLC services

Multi-Agent Deep Reinforcement Learning-Based Trajectory Planning for Multi-UAV Assisted Mobile Edge Computing

MobileVisFixer: Tailoring Web Visualizations for Mobile Phones Leveraging an Explainable Reinforcement Learning Framework.