Robustness evaluation of offline reinforcement learning for robot control against action perturbations

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Offline reinforcement learning, which learns solely from datasets without environmental interaction, has gained attention. This approach, similar to traditional online deep reinforcement learning, is particularly promising for robot control applications. Nevertheless, its robustness against real-world challenges, such as joint actuator faults in robots, remains a critical concern. This study evaluates the robustness of existing offline reinforcement learning methods using legged robots from OpenAI Gym based on average episodic rewards. For robustness evaluation, we simulate failures by incorporating both random and adversarial perturbations, representing worst-case scenarios, into the joint torque signals. Our experiments show that existing offline reinforcement learning methods exhibit significant vulnerabilities to these action perturbations and are more vulnerable than online reinforcement learning methods, highlighting the need for more robust approaches in this field.

ReferencesShowing 10 of 24 papers
  • Cite Count Icon 6
  • 10.1109/tkde.2023.3302804
Sample Efficient Offline-to-Online Reinforcement Learning
  • Mar 1, 2024
  • IEEE Transactions on Knowledge and Data Engineering
  • Siyuan Guo + 6 more

  • Open Access Icon
  • 10.24963/ijcai.2024/507
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
  • Aug 1, 2024
  • Xiao-Yin Liu + 7 more

  • Open Access Icon
  • Cite Count Icon 5
  • 10.1609/aaai.v37i9.26345
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Han Zheng + 5 more

  • Cite Count Icon 3617
  • 10.1109/iros.2012.6386109
MuJoCo: A physics engine for model-based control
  • Oct 1, 2012
  • Emanuel Todorov + 2 more

  • 10.1007/978-981-97-8705-0_21
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
  • Jan 1, 2025
  • Thanh Nguyen + 3 more

  • Open Access Icon
  • Cite Count Icon 2
  • 10.1109/smc53654.2022.9945546
Adversarial joint attacks on legged robots
  • Oct 9, 2022
  • Takuto Otomo + 2 more

  • Cite Count Icon 5
  • 10.1109/lra.2022.3220531
Robust Adaptive Ensemble Adversary Reinforcement Learning
  • Oct 1, 2022
  • IEEE Robotics and Automation Letters
  • Peng Zhai + 4 more

  • Open Access Icon
  • Cite Count Icon 2204
  • 10.1109/iros.2017.8202133
Domain randomization for transferring deep neural networks from simulation to the real world
  • Sep 1, 2017
  • Josh Tobin + 5 more

  • Open Access Icon
  • 10.1109/ijcnn60899.2024.10651543
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space
  • Jun 30, 2024
  • Qianmei Liu + 2 more

  • Open Access Icon
  • 10.1109/dsn58291.2024.00038
Toward Evaluating Robustness of Reinforcement Learning with Adversarial Policy
  • Jun 24, 2024
  • Xiang Zheng + 5 more

Similar Papers
  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.jobe.2023.106992
Predictive control of power demand peak regulation based on deep reinforcement learning
  • Sep 1, 2023
  • Journal of Building Engineering
  • Qiming Fu + 6 more

Predictive control of power demand peak regulation based on deep reinforcement learning

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.cja.2024.07.012
Data-driven offline reinforcement learning approach for quadrotor’s motion and path planning
  • Jul 9, 2024
  • Chinese Journal of Aeronautics
  • Haoran Zhao + 4 more

Data-driven offline reinforcement learning approach for quadrotor’s motion and path planning

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.3390/pr11051571
Combining Reinforcement Learning Algorithms with Graph Neural Networks to Solve Dynamic Job Shop Scheduling Problems
  • May 21, 2023
  • Processes
  • Zhong Yang + 2 more

Smart factories have attracted a lot of attention from scholars for intelligent scheduling problems due to the complexity and dynamics of their production processes. The dynamic job shop scheduling problem (DJSP), as one of the intelligent scheduling problems, aims to make an optimized scheduling decision sequence based on the real-time dynamic job shop environment. The traditional reinforcement learning (RL) method converts the scheduling problem with a Markov process and combines its own reward method to obtain scheduling sequences in different real-time shop states. However, the definition of shop states often relies on the scheduling experience of the model constructor, which undoubtedly affects the optimization capability of the reinforcement learning model. In this paper, we combine graph neural network (GNN) and deep reinforcement learning (DRL) algorithm to solve DJSP. An agent model from job shop state analysis graph to scheduling rules is constructed, thus avoiding the problem that traditional reinforcement learning methods rely on scheduling experience to artificially set the state feature vectors. In addition, a new reward function is defined, and the experimental results prove that our proposed reward method is more effective. The effectiveness and feasibility of our model is demonstrated by comparing with general deep reinforcement learning algorithms on minimizing the earlier and later completion time, which also lays the foundation for solving the DJSP later.

  • Research Article
  • Cite Count Icon 10
  • 10.2215/cjn.0000000000000084
Reinforcement Learning for Clinical Applications.
  • Feb 8, 2023
  • Clinical Journal of the American Society of Nephrology
  • Kia Khezeli + 5 more

Reinforcement Learning for Clinical Applications.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/mlcr57210.2022.00013
Continuous Control for Autonomous Underwater Vehicle Path Following Using Deep Interactive Reinforcement Learning
  • Oct 1, 2022
  • Qilei Zhang + 5 more

With the increasing demand for ocean exploration, higher requirements on both autonomy and intelligence have been put forward on the development of Autonomous Underwater Vehicle (AUV). To this end, deep reinforcement learning methods have started being used to improve AUV's autonomy and intelligence in recent years. However, low learning efficiency and high learning cost of traditional deep reinforcement learning prevent from applying them to physical AUV systems in real underwater environments. Therefore, this paper proposed a deep interactive reinforcement learning method based on the Deep Deterministic Policy Gradient (DDPG) algorithm for continuous motion control of AUV path following. The highlight of our proposed method is the design of a new reward allocator. Specifically, different from current deep interactive reinforcement learning methods, we allow the human trainer to provide a preferred action based on the evaluation on AUV's current situation. Then, the reward allocator is used to assign rewards indirectly based on the preferred action to deal with the high frequency of continuous action changes of AUV. The proposed method was tested in a sinusoids curve following tasks in the Gazebo simulation platform with an AUV simulator of our lab. The experimental results and analysis show that AUV path following with our proposed method can learn a more stable policy about 100 episodes faster than learning from only environmental rewards or only human rewards.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.3390/en9090725
Application of a Gradient Descent Continuous Actor-Critic Algorithm for Double-Side Day-Ahead Electricity Market Modeling
  • Sep 9, 2016
  • Energies
  • Huiru Zhao + 4 more

An important goal of China’s electric power system reform is to create a double-side day-ahead wholesale electricity market in the future, where the suppliers (represented by GenCOs) and demanders (represented by DisCOs) compete simultaneously with each other in one market. Therefore, modeling and simulating the dynamic bidding process and the equilibrium in the double-side day-ahead electricity market scientifically is not only important to some developed countries, but also to China to provide a bidding decision-making tool to help GenCOs and DisCOs obtain more profits in market competition. Meanwhile, it can also provide an economic analysis tool to help government officials design the proper market mechanisms and policies. The traditional dynamic game model and table-based reinforcement learning algorithm have already been employed in the day-ahead electricity market modeling. However, those models are based on some assumptions, such as taking the probability distribution function of market clearing price (MCP) and each rival’s bidding strategy as common knowledge (in dynamic game market models), and assuming the discrete state and action sets of every agent (in table-based reinforcement learning market models), which are no longer applicable in a realistic situation. In this paper, a modified reinforcement learning method, called gradient descent continuous Actor-Critic (GDCAC) algorithm was employed in the double-side day-ahead electricity market modeling and simulation. This algorithm can not only get rid of the abovementioned unrealistic assumptions, but also cope with the Markov decision-making process with continuous state and action sets just like the real electricity market. Meanwhile, the time complexity of our proposed model is only O(n). The simulation result of employing the proposed model in the double-side day-ahead electricity market shows the superiority of our approach in terms of participant’s profit or social welfare compared with traditional reinforcement learning methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1007/s43674-022-00037-9
An unsupervised autonomous learning framework for goal-directed behaviours in dynamic contexts
  • Jun 1, 2022
  • Advances in Computational Intelligence
  • Chinedu Pascal Ezenkwu + 1 more

Due to their dependence on a task-specific reward function, reinforcement learning agents are ineffective at responding to a dynamic goal or environment. This paper seeks to overcome this limitation of traditional reinforcement learning through a task-agnostic, self-organising autonomous agent framework. The proposed algorithm is a hybrid of TMGWR for self-adaptive learning of sensorimotor maps and value iteration for goal-directed planning. TMGWR has been previously demonstrated to overcome the problems associated with competing sensorimotor techniques such SOM, GNG, and GWR; these problems include: difficulty in setting a suitable number of neurons for a task, inflexibility, the inability to cope with non-markovian environments, challenges with noise, and inappropriate representation of sensory observations and actions together. However, the binary sensorimotor-link implementation in the original TMGWR enables catastrophic forgetting when the agent experiences changes in the task and it is therefore not suitable for self-adaptive learning. A new sensorimotor-link update rule is presented in this paper to enable the adaptation of the sensorimotor map to new experiences. This paper has demonstrated that the TMGWR-based algorithm has better sample efficiency than model-free reinforcement learning and better self-adaptivity than both the model-free and the traditional model-based reinforcement learning algorithms. Moreover, the algorithm has been demonstrated to give the lowest overall computational cost when compared to traditional reinforcement learning algorithms.

  • Research Article
  • Cite Count Icon 25
  • 10.13031/trans.13633
Deep Reinforcement Learning-Based Irrigation Scheduling
  • Jan 1, 2020
  • Transactions of the ASABE
  • Yanxiang Yang + 5 more

Highlights Deep reinforcement learning-based irrigation scheduling is proposed to determine the amount of irrigation required at each time step considering soil moisture level, evapotranspiration, forecast precipitation, and crop growth stage. The proposed methodology was compared with traditional irrigation scheduling approaches and some machine learning based scheduling approaches based on simulation. Abstract. Machine learning has been widely applied in many areas, with promising results and large potential. In this article, deep reinforcement learning-based irrigation scheduling is proposed. This approach can automate the irrigation process and can achieve highly precise water application that results in higher simulated net return. Using this approach, the irrigation controller can automatically determine the optimal or near-optimal water application amount. Traditional reinforcement learning can be superior to traditional periodic and threshold-based irrigation scheduling. However, traditional reinforcement learning fails to accurately represent a real-world irrigation environment due to its limited state space. Compared with traditional reinforcement learning, the deep reinforcement learning method can better model a real-world environment based on multi-dimensional observations. Simulations for various weather conditions and crop types show that the proposed deep reinforcement learning irrigation scheduling can increase net return. Keywords: Automated irrigation scheduling, Deep reinforcement learning, Machine learning.

  • Book Chapter
  • Cite Count Icon 84
  • 10.1007/978-981-15-4095-0_2
Introduction to Reinforcement Learning
  • Jan 1, 2020
  • Zihan Ding + 3 more

In this chapter, we introduce the fundamentals of classical reinforcement learning and provide a general overview of deep reinforcement learning. We first start with the basic definitions and concepts of reinforcement learning, including the agent, environment, action, and state, as well as the reward function. Then, we describe a classical reinforcement learning problem, the bandit problem, to provide the readers with a basic understanding of the underlying mechanism of traditional reinforcement learning. Next, we introduce the Markov process, together with the Markov reward process and the Markov decision process. These notions are the cornerstones in formulating reinforcement learning tasks. The combination of the Markov reward process and value function estimation produces the core results used in most reinforcement learning methods: the Bellman equations. The optimal value functions and optimal policy can be derived through solving the Bellman equations. Three main approaches for solving the Bellman equations are then introduced: dynamic programming, Monte Carlo method, and temporal difference learning. We further introduce deep reinforcement learning for both policy and value function approximation in policy optimization. The contents in policy optimization are introduced in two main categories: value-based optimization and policy-based optimization. In value-based optimization, the gradient-based methods are introduced for leveraging deep neural networks, like Deep Q-Networks. In policy-based optimization, the deterministic policy gradient and stochastic policy gradient are introduced in detail with sufficient mathematical proofs. The combination of value-based and policy-based optimization produces the popular actor-critic structure, which leads to a large number of advanced deep reinforcement learning algorithms. This chapter will lay a foundation for the rest of the book, as well as providing the readers with a general overview of deep reinforcement learning.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/a13090239
Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment
  • Sep 22, 2020
  • Algorithms
  • Menglin Li + 3 more

Reinforcement learning, as a branch of machine learning, has been gradually applied in the control field. However, in the practical application of the algorithm, the hyperparametric approach to network settings for deep reinforcement learning still follows the empirical attempts of traditional machine learning (supervised learning and unsupervised learning). This method ignores part of the information generated by agents exploring the environment contained in the updating of the reinforcement learning value function, which will affect the performance of the convergence and cumulative return of reinforcement learning. The reinforcement learning algorithm based on dynamic parameter adjustment is a new method for setting learning rate parameters of deep reinforcement learning. Based on the traditional method of setting parameters for reinforcement learning, this method analyzes the advantages of different learning rates at different stages of reinforcement learning and dynamically adjusts the learning rates in combination with the temporal-difference (TD) error values to achieve the advantages of different learning rates in different stages to improve the rationality of the algorithm in practical application. At the same time, by combining the Robbins–Monro approximation algorithm and deep reinforcement learning algorithm, it is proved that the algorithm of dynamic regulation learning rate can theoretically meet the convergence requirements of the intelligent control algorithm. In the experiment, the effect of this method is analyzed through the continuous control scenario in the standard experimental environment of ”Car-on-The-Hill” of reinforcement learning, and it is verified that the new method can achieve better results than the traditional reinforcement learning in practical application. According to the model characteristics of the deep reinforcement learning, a more suitable setting method for the learning rate of the deep reinforcement learning network proposed. At the same time, the feasibility of the method has been proved both in theory and in the application. Therefore, the method of setting the learning rate parameter is worthy of further development and research.

  • Conference Article
  • 10.1145/3374587.3374595
Reinforcement Learning Based on Multi-subnet Clusters
  • Dec 6, 2019
  • Xiaobing Wang + 1 more

The main task of reinforcement learning is to enable the subject to obtain the most reward from the environment. Reinforcement learning has been proposed and achieved certain results. However, many reinforcement learning methods still have inefficiencies that result in inability to meet the demand in some applications. Aiming at the above problems, this paper proposed a reinforcement learning algorithm based on multi-subnet cluster (MSC-RL). The proposed network consists of multiple subnet clusters and the primary storage network. Each subnet cluster is composed of multiple subnets and one sub-storage network. In the subnet cluster, multiple subnets are used to explore the solution space simultaneously and saves the searched information to the sub-storage network. At regular intervals, the subnet cluster saves the searched information to the primary storage network. In traditional reinforcement learning, there is not enough interaction between the independent subnets. Insufficient information interaction between the independent subnets can cause the algorithm to fall into local optimum. MSC-RL can exchange information searched by each subnet through the sub-storage network to realize information interaction within the subnet cluster. Each cluster uses the primary storage network for information interaction. The method enhances the information interaction between subnets and improves the ability of the algorithm to optimize. This paper uses the Atari game to verify the performance of the proposed method and compared it with some mainstream reinforcement learning methods. The experimental results show that the proposed algorithm is superior to some mainstream reinforcement learning methods in the performance of the Atari game.

  • Research Article
  • Cite Count Icon 10
  • 10.32604/cmc.2022.022952
Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control
  • Jan 1, 2022
  • Computers, Materials & Continua
  • Faizan Rasheed + 3 more

This paper investigates the use of multi-agent deep Q-network (MADQN) to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning (MARL) approach. The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions, particularly rainfall. MADQN is based on deep Q-network (DQN), which is an integration of the traditional reinforcement learning (RL) and the newly emerging deep learning (DL) approaches. MADQN enables traffic light controllers to learn, exchange knowledge with neighboring agents, and select optimal joint actions in a collaborative manner. A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia. Investigation is also performed using a grid traffic network (GTN) to understand that the proposed scheme is effective in a traditional traffic network. Our proposed scheme is evaluated using two simulation tools, namely Matlab and Simulation of Urban Mobility (SUMO). Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30% in the simulations.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/bracis.2017.63
Improving Reinforcement Learning Results with Qualitative Spatial Representation
  • Oct 1, 2017
  • Thiago Pedro Donadon Homem + 4 more

Reinforcement learning and Qualitative Spatial Reasoning methods have been successfully applied to create agents able to solve Artificial Intelligence problems in games, robotics, simulated or real. Generally, reinforcement learning methods represent the objects' position as quantitative values, performing the experiments considering these values. However, the humancommonsense understanding of the world is qualitative. This work proposes a hybrid method, that uses a qualitative formalism with reinforcement learning, named QRL, and is able to get better results than traditional methods. We have applied this proposal in the robot soccer domain and compared the results with traditional reinforcement learning method. The results show that, by using a qualitative spatial representation with reinforcement learning, the agent can learn optimal policies and perform more goals than quantitative representation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3390/electronics13071281
How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision Making
  • Mar 29, 2024
  • Electronics
  • Guanlin Wu + 4 more

Extensive research has been carried out on reinforcement learning methods. The core idea of reinforcement learning is to learn methods by means of trial and error, and it has been successfully applied to robotics, autonomous driving, gaming, healthcare, resource management, and other fields. However, when building reinforcement learning solutions at the edge, not only are there the challenges of data-hungry and insufficient computational resources but also there is the difficulty of a single reinforcement learning method to meet the requirements of the model in terms of efficiency, generalization, robustness, and so on. These solutions rely on expert knowledge for the design of edge-side integrated reinforcement learning methods, and they lack high-level system architecture design to support their wider generalization and application. Therefore, in this paper, instead of surveying reinforcement learning systems, we survey the most commonly used options for each part of the architecture from the point of view of integrated application. We present the characteristics of traditional reinforcement learning in several aspects and design a corresponding integration framework based on them. In this process, we show a complete primer on the design of reinforcement learning architectures while also demonstrating the flexibility of the various parts of the architecture to be adapted to the characteristics of different edge tasks. Overall, reinforcement learning has become an important tool in intelligent decision making, but it still faces many challenges in the practical application in edge computing. The aim of this paper is to provide researchers and practitioners with a new, integrated perspective to better understand and apply reinforcement learning in edge decision-making tasks.

More from: International Journal of Advanced Robotic Systems
  • Research Article
  • 10.1177/17298806251360659
Development of a low-cost modular snake-like robot with 2-DOF modules for rescue operations in collapsed environments with fast communication
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • G Seeja + 3 more

  • Research Article
  • 10.1177/17298806251325135
Research on variable impedance control of SEA-driven upper limb rehabilitation robot based on singular perturbation method
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Bingshan Hu + 4 more

  • Research Article
  • 10.1177/17298806251348118
Automatic cutting and suturing control system based on improved FP16 visual recognition algorithm
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Jiayin Wang + 1 more

  • Research Article
  • 10.1177/17298806251342040
Infrared object detection for robot vision based on multiple focus diffusion and task interaction alignment
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Jixu Zhang + 6 more

  • Research Article
  • 10.1177/17298806251360454
Robustness evaluation of offline reinforcement learning for robot control against action perturbations
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Shingo Ayabe + 3 more

  • Research Article
  • 10.1177/17298806251356720
Real-time path planning for Mecanum-wheeled robots with type-2 fuzzy logic controller
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Thanh-Lam Bui + 2 more

  • Research Article
  • 10.1177/17298806251352007
Smooth likelihood-based collision avoidance for polygon shaped and differential drive vehicles
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Yang Zhou + 1 more

  • Research Article
  • 10.1177/17298806251363648
Evaluation of a model-driven approach for the integration of robot operating system-based complex robot systems
  • Jul 1, 2025
  • International Journal of Advanced Robotic Systems
  • Nadia Hammoudeh García + 3 more

  • Research Article
  • 10.1177/17298806251352059
Intraoperative computed tomography-guided robotic needle biopsy system with real-time imaging ability and remote-center-of-motion control
  • May 1, 2025
  • International Journal of Advanced Robotic Systems
  • Zheng-Yang Lai + 6 more

  • Research Article
  • 10.1177/17298806251339684
A two-wheeled robotic wheelchair with a slidable seat for elderly and people with lower limb disabilities
  • May 1, 2025
  • International Journal of Advanced Robotic Systems
  • Munyu Kim + 6 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon