Original Markov Decision Processes Research Articles

As outer space becomes increasingly congested, there exists a growing need for auxiliary spacecraft to perform support missions for existing satellites with guarantees for safety and mission success. We focus on a multispacecraft inspection mission, wherein a team of “deputy” spacecraft inspect a passive “chief” spacecraft by traveling to a set of inspection points while satisfying a set of safety constraints, namely, that they avoid aligning themselves with the sun, that they avoid colliding with one another, and that they avoid colliding with the chief. We model the deputy dynamics using the Clohessy–Wiltshire–Hill equations, and subsequently discretize the environment by exploiting elliptical natural motion trajectories. Using this finite state space, we construct a Markov decision process (MDP) model of the environment and determine the optimal sequence of inspection points for each deputy to visit by solving a vehicle routing problem. To ensure that the deputies satisfy the safety constraints, we form the product MDP of the original MDP and a nondeterministic Büchi automaton that encodes the sensing task and safety constraints. Using this product MDP, we propose a pair of decentralized algorithms that each seeks to minimize the weighted combination of the time and fuel required to safely complete the mission. The first is an offline algorithm that synthesizes a safe trajectory for each deputy that requires no communication at runtime, while the second is an online algorithm that enforces safety at runtime by leveraging communication between the deputies. We provide numerical examples demonstrating the efficacy of both proposed algorithms.

In this article, we consider a parking lot that manages the charging processes of its parked electric vehicles (EVs). Upon arrival, each EV requests a certain amount of energy. This request should be fulfilled before the EV’s departure. It is of critical importance to coordinate the EVs’ charging rates to smooth out the load profile of the parking lot because inappropriate charging rates can lead to sharp spikes and fluctuations on the load profile, imposing negative effects on the power grid. Meanwhile, empirical studies show that many parking lots exhibit statistical patterns on EV dynamics. For example, the bulk of EVs arrives during rush hours. Therefore, in this article, we incorporate such patterns into charging rate coordination. Although the statistical patterns can be summarized from historical data, they are difficult to be analytically modeled. As a result, we adopt a model-free deep reinforcement learning approach. We also take the latest continuous charging rate control technology into consideration. The decision variables are thus continuous and a policy gradient algorithm is needed to perform reinforcement learning. Technically, we first formulate the problem as a Markov decision process (MDP) with unknown state transition probabilities. To further derive a deep policy gradient algorithm, the challenge lies in the inconsistent and state-dependent action space of the MDP model, due to the constraint to satisfy EVs’ energy demands before their scheduled departure. To tackle the challenge, we design a customized model for neural network training by extending the action space to be consistent and state independent, and revise the reward function to penalize the neural network output if it is beyond the action space of the original MDP model. With this customized model, we then develop a deep policy gradient algorithm based on the proximal policy gradient framework. Numerical results show that our algorithm outperforms the benchmarks.

Original Markov Decision Processes Research Articles

Articles published on Original Markov Decision Processes

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Managing Resources for Shared Micromobility: Approximate Optimality in Large-Scale Systems

Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction.

Trajectory Synthesis for the Coordinated Inspection of a Spacecraft with Safety Guarantees

Dynamic Clustering and Resource Allocation Using Deep Reinforcement Learning for Smart-Duplex Networks

Data-Driven Coordinated Charging for Electric Vehicles With Continuous Charging Rates: A Deep Policy Gradient Approach

Data-Driven Radar Selection and Power Allocation Method for Target Tracking in Multiple Radar System

Solving K-MDPs

Optimal Energy Cooperation Policy in Fusion Center-Based Sustainable Wireless Sensor Networks

Age of Information Aware Radio Resource Management in Vehicular Networks: A Proactive Deep Reinforcement Learning Perspective

Easy Affine Markov Decision Processes

Dynamic scheduling for wireless multicast in massive MIMO HetNet

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Stochastic eco-routing in a signalized traffic network

Reachability-based model reduction for Markov decision process

Stochastic Eco-routing in a Signalized Traffic Network

Dynamic Request Routing for Online Video-on-Demand Service: A Markov Decision Process Approach

EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Original Markov Decision Processes Research Articles

Articles published on Original Markov Decision Processes

Relaxed Equilibria for Time-Inconsistent Markov Decision Processes

Managing Resources for Shared Micromobility: Approximate Optimality in Large-Scale Systems

Sample Efficient Deep Reinforcement Learning With Online State Abstraction and Causal Transformer Model Prediction.

Trajectory Synthesis for the Coordinated Inspection of a Spacecraft with Safety Guarantees

Dynamic Clustering and Resource Allocation Using Deep Reinforcement Learning for Smart-Duplex Networks

Data-Driven Coordinated Charging for Electric Vehicles With Continuous Charging Rates: A Deep Policy Gradient Approach

Data-Driven Radar Selection and Power Allocation Method for Target Tracking in Multiple Radar System

Solving K-MDPs

Optimal Energy Cooperation Policy in Fusion Center-Based Sustainable Wireless Sensor Networks

Age of Information Aware Radio Resource Management in Vehicular Networks: A Proactive Deep Reinforcement Learning Perspective

Easy Affine Markov Decision Processes

Dynamic scheduling for wireless multicast in massive MIMO HetNet

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations

Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Stochastic eco-routing in a signalized traffic network

Reachability-based model reduction for Markov decision process

Stochastic Eco-routing in a Signalized Traffic Network

Dynamic Request Routing for Online Video-on-Demand Service: A Markov Decision Process Approach

EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING