Model-free Reinforcement Learning Algorithm Research Articles

Autonomous learning of robotic skills seems to be more natural and more practical than engineered skills, analogous to the learning process of human individuals. Policy gradient methods are a type of reinforcement learning technique which have great potential in solving robot skills learning problems. However, policy gradient methods require too many instances of robot online interaction with the environment in order to learn a good policy, which means lower efficiency of the learning process and a higher likelihood of damage to both the robot and the environment. In this paper, we propose a two-phase (imitation phase and practice phase) framework for efficient learning of robot walking skills, in which we pay more attention to the quality of skill learning and sample efficiency at the same time. The training starts with what we call the first stage or the imitation phase of learning, updating the parameters of the policy network in a supervised learning manner. The training set used in the policy network learning is composed of the experienced trajectories output by the iterative linear Gaussian controller. This paper also refers to these trajectories as near-optimal experiences. In the second stage, or the practice phase, the experiences for policy network learning are collected directly from online interactions, and the policy network parameters are updated with model-free reinforcement learning. The experiences from both stages are stored in the weighted replay buffer, and they are arranged in order according to the experience scoring algorithm proposed in this paper. The proposed framework is tested on a biped robot walking task in a MATLAB simulation environment. The results show that the sample efficiency of the proposed framework is much higher than ordinary policy gradient algorithms. The algorithm proposed in this paper achieved the highest cumulative reward, and the robot learned better walking skills autonomously. In addition, the weighted replay buffer method can be made as a general module for other model-free reinforcement learning algorithms. Our framework provides a new way to combine model-based reinforcement learning with model-free reinforcement learning to efficiently update the policy network parameters in the process of robot skills learning.

Read full abstract

The design of reinforcing steel bars (rebars) is critical to reinforced concrete (RC) structures. Generally, a good number of rebars are required by a design code, particularly at member connections. As such, rebar clashes (i.e., collisions and congestions) would be inevitable. It would be impractical, labor-intensive, and error-prone to avoid all possible clashes manually or even using standard design software. The building information modeling (BIM) technology has been utilized by the present architecture, engineering, and construction (ACE) industry for clash-free rebar designs. However, most existing BIM-based approaches offer the clash resolution strategy for moving components with an optimization algorithm, and are only applicable to the RC structures with regular shapes. In particular, the optimized path of rebars cannot be adjusted to avoid the obstacles, thus limiting the practical applications. Furthermore, most existing studies lack the learning from design code and constructibility constraints to realize automatic and intelligent arrangement and adjustment of rebars for avoiding the obstacles encountered in complex RC joints and frame structures. Considering these shortcomings, the authors have recently proposed an immediate reward-based multi-agent reinforcement learning (MARL) system with BIM, towards automatic clash-free rebar designs of RC joints without clashes. However, as the immediate reward is required in the MARL system for guiding the learning of a rebar design, it will not succeed in clash-free rebar designs of complex RC structures where immediate reward is often unavailable. In this study, this study further extends the previous work with Q-learning (a model-free reinforcement learning algorithm) for more realistic path planning considering both immediate and delayed rewards in clash-free rebar designs for real-world RC structures. In particular, the rebar design problem is treated as a path-planning problem of multi-agent system, where each rebar is deemed as an intelligence reinforcement learning agent. Next, by employing the Q-learning as the reinforcement learning engine, the particular form of state, action, and immediate and delayed rewards for the reinforcement MARL for automatic rebar designs considering more actual constructible constraints and design codes can be developed. Comprehensive experiments on three typical beam-column joints and a two-story RC building frame were conducted to evaluate the efficiency of the proposed method. The study results of paths of rebar designs, success rates, and average time confirm that the proposed framework with MARL and BIM is effective and efficient.

Read full abstract

Model-free Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Model-free Reinforcement Learning Algorithm

SQLR: Short-Term Memory Q-Learning for Elastic Provisioning

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

An Experience Replay Method Based on Tree Structure for Reinforcement Learning

UAV Networks Against Multiple Maneuvering Smart Jamming With Knowledge-Based Reinforcement Learning

Output event-triggered tracking synchronization of heterogeneous systems on directed digraph via model-free reinforcement learning

Efficient Robot Skills Learning with Weighted Near-Optimal Experiences Policy Optimization

Research on sports action training method based on generative confrontation network model and artificial intelligence

Low-Cost Multi-Agent Navigation via Reinforcement Learning With Multi-Fidelity Simulator

Perception-Action Coupling Target Tracking Control for a Snake Robot via Reinforcement Learning.

Priority-Aware Reinforcement-Learning-Based Integrated Design of Networking and Control for Industrial Internet of Things

An optimal policy for joint compression and transmission control in delay-constrained energy harvesting IoT devices

Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

PMA-DRL: A parallel model-augmented framework for deep reinforcement learning algorithms

Controller Optimization for Multirate Systems Based on Reinforcement Learning

Theory-Based Causal Transfer:Integrating Instance-Level Induction and Abstract-Level Structure Learning

Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks

Automated clash resolution for reinforcement steel design in concrete frames via Q-learning and Building Information Modeling

Dopamine transients do not act as model-free prediction errors during associative learning

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Model-free Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Model-free Reinforcement Learning Algorithm

SQLR: Short-Term Memory Q-Learning for Elastic Provisioning

Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

An Experience Replay Method Based on Tree Structure for Reinforcement Learning

UAV Networks Against Multiple Maneuvering Smart Jamming With Knowledge-Based Reinforcement Learning

Output event-triggered tracking synchronization of heterogeneous systems on directed digraph via model-free reinforcement learning

Efficient Robot Skills Learning with Weighted Near-Optimal Experiences Policy Optimization

Research on sports action training method based on generative confrontation network model and artificial intelligence

Low-Cost Multi-Agent Navigation via Reinforcement Learning With Multi-Fidelity Simulator

Perception-Action Coupling Target Tracking Control for a Snake Robot via Reinforcement Learning.

Priority-Aware Reinforcement-Learning-Based Integrated Design of Networking and Control for Industrial Internet of Things

An optimal policy for joint compression and transmission control in delay-constrained energy harvesting IoT devices

Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

PMA-DRL: A parallel model-augmented framework for deep reinforcement learning algorithms

Controller Optimization for Multirate Systems Based on Reinforcement Learning

Theory-Based Causal Transfer:Integrating Instance-Level Induction and Abstract-Level Structure Learning

Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks

Automated clash resolution for reinforcement steel design in concrete frames via Q-learning and Building Information Modeling

Dopamine transients do not act as model-free prediction errors during associative learning

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems