Value Function Approximation Research Articles

Modern data center power distribution networks place greater demands on the stability and reliability of power supply. Growing network computing demands and complex network environments can cause network congestion, which in turn leads to network traffic overload and power supply equipment overload. Therefore, network congestion is one of the most important problems faced by data center power distribution networks. In this paper, we propose an approach called ACC-RL based on reinforcement learning (RL), which can effectively avoid network congestion and improve energy performance. ACC-RL models the congestion control task as a Partially Observable Markov Decision Process (POMDP). It is independent of the estimated value function and supports deterministic policies. It also sets the reward value function using real-time network information such as the transmission rate, RTT, and switch queue length, with the target transmission rate as the target equilibrium point. ACC-RL is highly general, can be trained on datasets running in different network environments, and generates a robust congestion control policy. The experimental results show that ACC-RL can solve the congestion problem without any predefined scenarios in different network environments. It can control the network traffic well, thus ensuring the stability and reliability of the power supply in the distribution network. We conduct network simulation experiments through NS-3. We set up different scenarios for experiments and data analysis in many-to-one, all-to-all, and long–short network environments. Compared with the popular rule-based congestion control algorithms such as TIMELY, DCQCN, and HPCC, ACC-RL shows different degrees of energy performance advantages in network metrics such as fairness, link utilization, and throughput.

Connected automated vehicles (CAVs) are broadly recognized as next-generation transformative transportation technologies having great potential to improve traffic safety, efficiency, and stability. Efficiently controlling CAVs on two-dimensional curvilinear roadways to follow preceding vehicles is denoted as the two-dimensional car-following process, which is highly critical; this process is challenging to implement owing to the complexity and varied nature of driving environments. This study proposes an innovative integrated two-dimensional control strategy for CAVs based on deep reinforcement learning (DRL), which efficiently regulates the two-dimensional car-following process of CAVs in terms of both stability-wise longitudinal control performance and accurate lateral path-tracking performance. Within the control framework, each CAV can receive the surrounding information from downstream vehicles and roadway geometry based on vehicle-to-everything (V2X) communication. To better utilize this information, we designed a physics-informed DRL state fusion approach and reward function, which efficiently embeds prior physics knowledge and borrows the merits of the equilibrium and consensus concepts from the control theory. Given the physics-informed information, the DRL-based controller outputs the integrated control instructions for both longitudinal and lateral control. For training, we constructed a roadway with a set of varying curvatures and embedded the ground-truth vehicle trajectory datasets to more effectively capture the realistic variations in the roadway geometry and driving environment. To facilitate value function approximation and enhance the policy iteration process in training, the distributed proximal policy optimization (DPPO) algorithm was applied, owing to its balanced performance. A series of simulated experiments were conducted to validate the controller’s lateral control accuracy and stability-wise oscillation dampening performance in diverse traffic scenarios, including extreme ones.

Value Function Approximation Research Articles

Related Topics

Articles published on Value Function Approximation

Online attentive kernel-based temporal difference learning

Differential dynamic programming for finite‐horizon zero‐sum differential games of nonlinear systems

A Decentralized Learning Control Scheme for Constrained Nonlinear Interconnected Systems Based on Dynamic Event-Triggered Mechanism

ACC-RL: Adaptive Congestion Control Based on Reinforcement Learning in Power Distribution Networks with Data Centers

Spatiotemporal decomposed dispatch of integrated electricity-gas system via stochastic dual dynamic programming-based value function approximation

Reinforcement learning method for the multi-objective speed trajectory optimization of a freight train

Approximation Benefits of Policy Gradient Methods with Aggregated States

Reducing sample size requirements by extending discrete choice experiments to indifference elicitation

Solution for Pursuit-Evasion Game of Agents by Adaptive Dynamic Programming

Optimal Adaptive Control of Uncertain Nonlinear Continuous-Time Systems With Input and State Delays.

Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

Neural Temporal Difference and Q Learning Provably Converge to Global Optima

Cross-docking based factory logistics unitisation process: An approximate dynamic programming approach

A tutorial on value function approximation for stochastic and dynamic transportation

Study on learning algorithm of transfer reinforcement for multi-agent formation control

Safe batch constrained deep reinforcement learning with generative adversarial network

Physics-informed deep reinforcement learning-based integrated two-dimensional car-following control strategy for connected automated vehicles

Solving the joint military medical evacuation problem via a random forest approximate dynamic programming approach

Off-policy and on-policy reinforcement learning with the Tsetlin machine

Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Value Function Approximation Research Articles

Related Topics

Articles published on Value Function Approximation

Online attentive kernel-based temporal difference learning

Differential dynamic programming for finite‐horizon zero‐sum differential games of nonlinear systems

A Decentralized Learning Control Scheme for Constrained Nonlinear Interconnected Systems Based on Dynamic Event-Triggered Mechanism

ACC-RL: Adaptive Congestion Control Based on Reinforcement Learning in Power Distribution Networks with Data Centers

Spatiotemporal decomposed dispatch of integrated electricity-gas system via stochastic dual dynamic programming-based value function approximation

Reinforcement learning method for the multi-objective speed trajectory optimization of a freight train

Approximation Benefits of Policy Gradient Methods with Aggregated States

Reducing sample size requirements by extending discrete choice experiments to indifference elicitation

Solution for Pursuit-Evasion Game of Agents by Adaptive Dynamic Programming

Optimal Adaptive Control of Uncertain Nonlinear Continuous-Time Systems With Input and State Delays.

Sample Complexity and Overparameterization Bounds for Temporal-Difference Learning With Neural Network Approximation

Neural Temporal Difference and Q Learning Provably Converge to Global Optima

Cross-docking based factory logistics unitisation process: An approximate dynamic programming approach

A tutorial on value function approximation for stochastic and dynamic transportation

Study on learning algorithm of transfer reinforcement for multi-agent formation control

Safe batch constrained deep reinforcement learning with generative adversarial network

Physics-informed deep reinforcement learning-based integrated two-dimensional car-following control strategy for connected automated vehicles

Solving the joint military medical evacuation problem via a random forest approximate dynamic programming approach

Off-policy and on-policy reinforcement learning with the Tsetlin machine

Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation