State-action Space Research Articles

Combined cooling, heating, and power (CCHP), coupled with renewable energy generation and energy storage can achieve a low-carbon, multi-energy complementary, and flexible energy system. However, the inclusion of renewable resources and energy storage poses significant challenges to the operational management of such systems. Conventional algorithms are limited when solving nonlinear and uncertain non-convex optimization problems thus, deep reinforcement learning (DRL) is considered the most effective method to solve these issues because of its powerful nonlinear fitting ability, model-free utilization, and capability of solving decision-making problems. This study proposes a novel operation strategy optimization model based on DRL to minimize the operation cost of an energy system, which composes of CCHP, photovoltaic generation and energy storage system. The optimization problem was transformed into a Markov decision process (MDP), and the state space and action space are differently modeling in summer and winter scenarios. Two models of handling action constraints are proposed and testing in terms of performance. Two DRL algorithms, Deep Deterministic policy gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3), were verified separately via comparison with the conventional algorithms, particle swarm optimization (PSO) and mathematical programming under perfect input conditions. Results show that setting different training environments for summer and winter respectively can get better optimization results, and the model with better performance in handling action constraints was validated from the operation strategy and optimization results. The performance of the TD3 method is comparable to the theoretical benchmark, with an average error of approximately 5%. The computation time for a single-step online decision is only 0.001s and for a 24-step online decision is only 0.006s, it significantly improves operational efficiency, demonstrating the adaptability of DRL methods for optimization and computational performance.

Read full abstract

In this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.e., collected from multi-intention experts, unlabeled, i.e., without intention labels, and partially overlapping, i.e., shared between multiple intentions. In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions. To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages. Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation. However, in high-dimensional environments and unknown system dynamics, i.e., model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant. To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density. To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space. We provide theoretical analyses under the global optimality condition and the existence of shared pairs. Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability. We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.

Read full abstract

State-action Space Research Articles

Articles published on State-action Space

Deep Reinforcement Learning-Assisted Optimization for Resource Allocation in Downlink OFDMA Cooperative Systems.

Efficient reinforcement learning with least-squares soft Bellman residual for robotic grasping

Autonomous collision avoidance system in a multi-ship environment based on proximal policy optimization method

Path Planning for Ferry Crossing Inland Waterways Based on Deep Reinforcement Learning

Adaptive Weight Tuning of EWMA Controller via Model-Free Deep Reinforcement Learning

Multiagent Soft Actor–Critic for Traffic Light Timing

A Centralized Routing for Lifetime and Energy Optimization in WSNs Using Genetic Algorithm and Least-Square Policy Iteration

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Classical Actor-Critic Applied to the Control of a Self – Regulatory Process

Joint Scheduling of Proactive Pushing and On-Demand Transmission Over Shared Spectrum for Profit Maximization

Deep Reinforcement Learning Based Active Pantograph Control Strategy in High-Speed Railway

PPO-Based PDACB Traffic Control Scheme for Massive IoV Communications

A Novel Exploration-Exploitation-Based Adaptive Law for Intelligent Model-Free Control Approaches.

Congestion-Aware Path Coordination Game With Markov Decision Process Dynamics

Intra-Domain Knowledge Reuse Assisted Reinforcement Learning for Fast Anti-Jamming Communication

A Hybrid Linear Programming-Reinforcement Learning Method for Optimal Energy Hub Management

Reinforcement learning using Deep Q networks and Q learning accurately localizes brain tumors on MRI with very small training sets

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Operation strategy optimization of combined cooling, heating, and power systems with energy storage and renewable energy based on deep reinforcement learning

Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

State-action Space Research Articles

Articles published on State-action Space

Deep Reinforcement Learning-Assisted Optimization for Resource Allocation in Downlink OFDMA Cooperative Systems.

Efficient reinforcement learning with least-squares soft Bellman residual for robotic grasping

Autonomous collision avoidance system in a multi-ship environment based on proximal policy optimization method

Path Planning for Ferry Crossing Inland Waterways Based on Deep Reinforcement Learning

Adaptive Weight Tuning of EWMA Controller via Model-Free Deep Reinforcement Learning

Multiagent Soft Actor–Critic for Traffic Light Timing

A Centralized Routing for Lifetime and Energy Optimization in WSNs Using Genetic Algorithm and Least-Square Policy Iteration

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Classical Actor-Critic Applied to the Control of a Self – Regulatory Process

Joint Scheduling of Proactive Pushing and On-Demand Transmission Over Shared Spectrum for Profit Maximization

Deep Reinforcement Learning Based Active Pantograph Control Strategy in High-Speed Railway

PPO-Based PDACB Traffic Control Scheme for Massive IoV Communications

A Novel Exploration-Exploitation-Based Adaptive Law for Intelligent Model-Free Control Approaches.

Congestion-Aware Path Coordination Game With Markov Decision Process Dynamics

Intra-Domain Knowledge Reuse Assisted Reinforcement Learning for Fast Anti-Jamming Communication

A Hybrid Linear Programming-Reinforcement Learning Method for Optimal Energy Hub Management

Reinforcement learning using Deep Q networks and Q learning accurately localizes brain tumors on MRI with very small training sets

Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Operation strategy optimization of combined cooling, heating, and power systems with energy storage and renewable energy based on deep reinforcement learning

Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations