Off-policy Reinforcement Learning Research Articles

CEM-TD3 is a combination scheme using the simple cross-entropy method (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3), and it achieves a satisfactory trade-off between performance and sample efficiency. However, we find that CEM-TD3 cannot fully address the low efficiency of policy search caused by CEM, and the policy gradient learning introduced by TD3 will weaken the diversity of individuals in the population. In this paper, we propose Double Buffers CEM-TD3 (DBCEM-TD3) that optimizes both CEM and TD3. For CEM, DBCEM-TD3 maintains an actor buffer to store the population required for evolution. In each iteration, it only needs to generate a small number of actors to replace the poor actors in the policy buffer to achieve more efficient evolution. The fitness of individuals in the actor buffer decreases exponentially with time, which can avoid premature convergence of the mean actor. For TD3, DBCEM-TD3 maintains a critic buffer with the same number of critics as the number of actors generated in each iteration, and each critic is trained independently by sampling from the shared replay buffer. In each iteration, each newly generated actor uses different critics to guide learning. This ensures more diverse behaviors among the learned actors, enabling richer experiences to be collected during the evaluation phase. We conduct experimental evaluations on five continuous control tasks provided by OpenAI Gym. DBCEM-TD3 outperforms CEM-TD3, TD3, and other classic off-policy reinforcement learning algorithms in terms of performance and sample efficiency.

Fairness-aware recommendation alleviates discrimination issues to build trustworthy recommendation systems. Explaining the causes of unfair recommendations is critical, as it promotes fairness diagnostics, and thus secures users’ trust in recommendation models. Existing fairness explanation methods suffer high computation burdens due to the large-scale search space and the greedy nature of the explanation search process. Besides, they perform feature-level optimizations with continuous values, which are not applicable to discrete attributes such as gender and age. In this work, we adopt counterfactual explanations from causal inference and propose to generate attribute-level counterfactual explanations, adapting to discrete attributes in recommendation models. We use real-world attributes from Heterogeneous Information Networks (HINs) to empower counterfactual reasoning on discrete attributes. We propose a Counterfactual Explanation for Fairness (CFairER) that generates attribute-level counterfactual explanations from HINs for item exposure fairness. Our CFairER conducts off-policy reinforcement learning to seek high-quality counterfactual explanations, with attentive action pruning reducing the search space of candidate counterfactuals. The counterfactual explanations help to provide rational and proximate explanations for model fairness, while the attentive action pruning narrows the search space of attributes. Extensive experiments demonstrate our proposed model can generate faithful explanations while maintaining favorable recommendation performance.

Off-policy Reinforcement Learning Research Articles

Related Topics

Articles published on Off-policy Reinforcement Learning

Path Planning for Unmanned Aerial Vehicle via Off-Policy Reinforcement Learning With Enhanced Exploration

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Heterogeneous unmanned swarm formation containment control based on reinforcement learning

Fully Data-Driven Robust Output Formation Tracking Control for Heterogeneous Multiagent System With Multiple Leaders and Actuator Faults.

Integrating an Ensemble Reward System into an Off-Policy Reinforcement Learning Algorithm for the Economic Dispatch of Small Modular Reactor-Based Energy Systems

Mixed experience sampling for off-policy reinforcement learning

Integrating human learning and reinforcement learning: A novel approach to agent training

Utilizing reinforcement learning for de novo drug design

Optimal Tracking Control of Heterogeneous MASs Using Event-Driven Adaptive Observer and Reinforcement Learning.

Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling

Double Buffers CEM-TD3: More Efficient Evolution and Richer Exploration

Counterfactual Explanation for Fairness in Recommendation

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Distributed Minmax Strategy for Multiplayer Games: Stability, Robustness, and Algorithms.

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics.

Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse

Re-attentive experience replay in off-policy reinforcement learning

Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games.

Model-free control of underwater vehicle-manipulator system interacting with unknown environments

Intelligent optimization method for real-time decision-making in laminated cooling configurations through reinforcement learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Off-policy Reinforcement Learning Research Articles

Related Topics

Articles published on Off-policy Reinforcement Learning

Path Planning for Unmanned Aerial Vehicle via Off-Policy Reinforcement Learning With Enhanced Exploration

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Heterogeneous unmanned swarm formation containment control based on reinforcement learning

Fully Data-Driven Robust Output Formation Tracking Control for Heterogeneous Multiagent System With Multiple Leaders and Actuator Faults.

Integrating an Ensemble Reward System into an Off-Policy Reinforcement Learning Algorithm for the Economic Dispatch of Small Modular Reactor-Based Energy Systems

Mixed experience sampling for off-policy reinforcement learning

Integrating human learning and reinforcement learning: A novel approach to agent training

Utilizing reinforcement learning for de novo drug design

Optimal Tracking Control of Heterogeneous MASs Using Event-Driven Adaptive Observer and Reinforcement Learning.

Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling

Double Buffers CEM-TD3: More Efficient Evolution and Richer Exploration

Counterfactual Explanation for Fairness in Recommendation

Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

Distributed Minmax Strategy for Multiplayer Games: Stability, Robustness, and Algorithms.

Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics.

Off-policy RL algorithms can be sample-efficient for continuous control via sample multiple reuse

Re-attentive experience replay in off-policy reinforcement learning

Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games.

Model-free control of underwater vehicle-manipulator system interacting with unknown environments

Intelligent optimization method for real-time decision-making in laminated cooling configurations through reinforcement learning