Large Markov Decision Processes Research Articles

To develop sustainable fifth generation (5G) wireless networks and utilize the unused spectrum, this paper focuses on cognitive radio (CR) based wireless information and energy transmissions from an unmanned aerial vehicle (UAV) to multiple low-power ground terminals (GTs). By practically considering the location-dependent air-to-ground (A2G) channel states and the non-linear energy harvesting (EH), we propose a dynamic fly-hover-transmit scheme, where the UAV successively flies between GTs, and hovers close to each GT for efficient wireless energy transfer (WET) or wireless information transfer (WIT) when the primary user (PU) is idle. By causally and optimally determining the UAV’s mobility and transmit power for each selected transmission mode (WIT, WET, or being silent), we formulate the UAV’s sum-throughput maximization over all GTs as a constrained Markov decision process (MDP) problem with battery energy constraints at all GTs and the UAV. Due to the infinitely large MDP system state space, this problem is difficult to solve. We then decompose this problem into two subproblems, by first deciding the UAV’s transmission mode and power above a given GT, and then optimizing the UAV movement policy over multiple GTs. In the first subproblem, we propose an approximate to the complicated MDP value function of low complexity in closed-form, and then analytically derive the threshold-based suboptimal transmission policies. In the second subproblem, we optimally solve a simple-but-fundamental two-GT case, and then extend the general location-dependent GT weight design to an efficient suboptimal UAV movement policy. Simulation results show the significantly improved system performance under the proposed suboptimal policies over various benchmarks in dynamic networks.

Read full abstract

The cost-effective management of aged infrastructure is an issue of worldwide concern. Markov decision process (MDP) models have been used in developing structural maintenance policies. Recent advances in the artificial intelligence (AI) community have shown that deep reinforcement learning (DRL) has the potential to solve large MDP optimization tasks. This paper proposes a novel automated DRL framework to obtain an optimized structural maintenance policy. The DRL framework contains a decision maker (AI agent) and the structure that needs to be maintained (AI task environment). The agent outputs maintenance policies and chooses maintenance actions, and the task environment determines the state transition of the structure and returns rewards to the agent under given maintenance actions. The advantages of the DRL framework include: (1) a deep neural network (DNN) is employed to learn the state-action Q value (defined as the predicted discounted expectation of the return for consequences under a given state-action pair), either based on simulations or historical data, and the policy is then obtained from the Q value; (2) optimization of the learning process is sample-based so that it can learn directly from real historical data collected from multiple bridges (i.e., big data from a large number of bridges); and (3) a general framework is used for different structure maintenance tasks with minimal changes to the neural network architecture. Case studies for a simple bridge deck with seven components and a long-span cable-stayed bridge with 263 components are performed to demonstrate the proposed procedure. The results show that the DRL is efficient at finding the optimal policy for maintenance tasks for both simple and complex structures.

Read full abstract

Large Markov Decision Processes Research Articles

Articles published on Large Markov Decision Processes

A new parallelized of hierarchical value iteration algorithm for discounted Markov decision processes

Decomposition Methods for Solving Finite‐Horizon Large MDPs

Leveraging experience in lazy search

UAV-Aided Information and Energy Transmissions for Cognitive and Sustainable 5G Networks

A deep learning approach for the dynamic dispatching of unreliable machines in re-entrant production systems

Optimal policy for structure maintenance: A deep reinforcement learning framework

Migration Modeling and Learning Algorithms for Containers in Fog Computing

Offline reinforcement learning with task hierarchies

Accelerated decomposition techniques for large discounted Markov decision processes

Markov Decision Process-Based Distributed Conflict Resolution for Drone Air Traffic Management

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

Sleeping experts and bandits approach to constrained Markov decision processes

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Approximate planning and verification for large Markov decision processes

Scalable approximate policies for Markov decision process models of hospital elective admissions.

SOLVING THE SAILING PROBLEM WITH A NEW PRIORITIZED VALUE ITERATION

Linear Dynamic Programs for Resource Management

New prioritized value iteration for Markov decision processes

Using Bisimulation for Policy Transfer in MDPs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Markov Decision Processes Research Articles

Articles published on Large Markov Decision Processes

A new parallelized of hierarchical value iteration algorithm for discounted Markov decision processes

Decomposition Methods for Solving Finite‐Horizon Large MDPs

Leveraging experience in lazy search

UAV-Aided Information and Energy Transmissions for Cognitive and Sustainable 5G Networks

A deep learning approach for the dynamic dispatching of unreliable machines in re-entrant production systems

Optimal policy for structure maintenance: A deep reinforcement learning framework

Migration Modeling and Learning Algorithms for Containers in Fog Computing

Offline reinforcement learning with task hierarchies

Accelerated decomposition techniques for large discounted Markov decision processes

Markov Decision Process-Based Distributed Conflict Resolution for Drone Air Traffic Management

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

Sleeping experts and bandits approach to constrained Markov decision processes

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Approximate planning and verification for large Markov decision processes

Scalable approximate policies for Markov decision process models of hospital elective admissions.

SOLVING THE SAILING PROBLEM WITH A NEW PRIORITIZED VALUE ITERATION

Linear Dynamic Programs for Resource Management

New prioritized value iteration for Markov decision processes

Using Bisimulation for Policy Transfer in MDPs