Actor-critic Structure Research Articles

In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.

Mobile edge computing (MEC) has recently emerged as an effective paradigm for computation-intensive and delay-critical applications supported by Internet of Things (IoT) devices. However, the computational resources at MEC servers are normally much smaller compared with the remote cloud server (CS). To handle the ever-increasing IoT devices and applications, the collaborative edge and cloud computing should be jointly exploited. Further, with finite block length (FBL) utilized in URLLC-supported networks, the coding error probability cannot be ignored. On these grounds, this paper investigates the dynamic offloading of FBL packets in an edge-cloud collaborative MEC system consisting of multi-mobile IoT devices (MIDs) with energy harvesting (EH), multi-edge servers, and one CS in a dynamic environment. The optimization problem is formulated to minimize the average long-term service cost defined as the weighted sum of MID energy consumption and service delay (including the uploading transmission delay, the handover cost, and the execution delay for the partial offloaded part, the local execution delay for the local processing part), with the constraints of the available resource, the energy causality, the allowable service delay, and the maximum decoding error probability. To address the problem involving both discrete and continuous variables, we propose a multi-device hybrid decision-based deep reinforcement learning (DRL) solution, named DDPG-D3QN algorithm, where the deep deterministic policy gradient (DDPG) and dueling double deep Q networks (D3QN) are invoked to tackle continuous and discrete action domains, respectively. Specifically, we improve the actor-critic structure of DDPG by combining D3QN. It utilizes the actor part of DDPG to search for the optimal offloading rate and power control of local execution. Meanwhile, it combines the critic part of DDPG with D3QN to select the optimal server for offloading. Simulation results demonstrate the proposed DDPG-D3QN algorithm has better stability and faster convergence while achieving higher rewards than the existing DRL-based methods. Furthermore, the edge-cloud collaboration has been proven to attain improved performance when compared with other offloading schemes with no collaboration between edge and cloud.

Actor-critic Structure Research Articles

Related Topics

Articles published on Actor-critic Structure

Multi-agent DRL for edge computing: A real-time proportional compute offloading

Adaptive dynamic programming base on MMC device of a flexible high-altitude long endurance aircraft

Data-Driven Optimal Bipartite Consensus Control for Second-Order Multiagent Systems via Policy Gradient Reinforcement Learning.

A Hybrid Controller for Musculoskeletal Robots Targeting Lifting Tasks in Industrial Metaverse.

Nearly Optimal Control for Mixed Zero-Sum Game Based on Off-Policy Integral Reinforcement Learning.

Reinforcement learning-based adaptive motion control for autonomous vehicles via actor-critic structure

Realizing asynchronous finite-time robust tracking control of switched flight vehicles by using nonfragile deep reinforcement learning.

Hierarchical Multiagent Formation Control Scheme via Actor-Critic Learning.

DRL based low carbon economic dispatch by considering power transmission safety limitations in internet of energy

Decentralized graph-based multi-agent reinforcement learning using reward machines

The Novel Application of Deep Reinforcement to Solve the Rebalancing Problem of Bicycle Sharing Systems with Spatiotemporal Features

Multi-Agent Deep Reinforcement Learning Framework Strategized by Unmanned Aerial Vehicles for Multi-Vessel Full Communication Connection

Intelligent Resource Allocation for Edge-Cloud Collaborative Networks: A Hybrid DDPG-D3QN Approach

Fuzzy Control Based on Reinforcement Learning and Subsystem Error Derivatives for Strict-Feedback Systems With an Observer

Adaptive dynamic programming for data-based optimal state regulation with experience replay

Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems

Adaptive Fuzzy Optimal Control for Switched Nonlinear Systems With Output Hysteresis

Model-Based Reinforcement Learning Control of Electrohydraulic Position Servo Systems

Precise chirp control with model-based reinforcement learning for broadband frequency-swept laser of LiDAR.

Approximate Optimal Tracking Control for Partially Unknown Nonlinear Systems via an Adaptive Fixed-Time Observer

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Actor-critic Structure Research Articles

Related Topics

Articles published on Actor-critic Structure

Multi-agent DRL for edge computing: A real-time proportional compute offloading

Adaptive dynamic programming base on MMC device of a flexible high-altitude long endurance aircraft

Data-Driven Optimal Bipartite Consensus Control for Second-Order Multiagent Systems via Policy Gradient Reinforcement Learning.

A Hybrid Controller for Musculoskeletal Robots Targeting Lifting Tasks in Industrial Metaverse.

Nearly Optimal Control for Mixed Zero-Sum Game Based on Off-Policy Integral Reinforcement Learning.

Reinforcement learning-based adaptive motion control for autonomous vehicles via actor-critic structure

Realizing asynchronous finite-time robust tracking control of switched flight vehicles by using nonfragile deep reinforcement learning.

Hierarchical Multiagent Formation Control Scheme via Actor-Critic Learning.

DRL based low carbon economic dispatch by considering power transmission safety limitations in internet of energy

Decentralized graph-based multi-agent reinforcement learning using reward machines

The Novel Application of Deep Reinforcement to Solve the Rebalancing Problem of Bicycle Sharing Systems with Spatiotemporal Features

Multi-Agent Deep Reinforcement Learning Framework Strategized by Unmanned Aerial Vehicles for Multi-Vessel Full Communication Connection

Intelligent Resource Allocation for Edge-Cloud Collaborative Networks: A Hybrid DDPG-D3QN Approach

Fuzzy Control Based on Reinforcement Learning and Subsystem Error Derivatives for Strict-Feedback Systems With an Observer

Adaptive dynamic programming for data-based optimal state regulation with experience replay

Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems

Adaptive Fuzzy Optimal Control for Switched Nonlinear Systems With Output Hysteresis

Model-Based Reinforcement Learning Control of Electrohydraulic Position Servo Systems

Precise chirp control with model-based reinforcement learning for broadband frequency-swept laser of LiDAR.

Approximate Optimal Tracking Control for Partially Unknown Nonlinear Systems via an Adaptive Fixed-Time Observer