Policies For Markov Decision Processes Research Articles

Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given, and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next k steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well-suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game Snake. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.

Read full abstract

This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic forgetting due to being agnostic to the timescale of changes in the distribution of experiences. Although RL algorithms are guaranteed to converge to optimal policies in Markov decision processes (MDPs), this only holds in the presence of static environments. However, this assumption is very restrictive. In many real-world problems like ride-sharing, traffic control, etc., we are dealing with highly dynamic environments, where RL methods yield only sub-optimal decisions. To mitigate this problem in highly dynamic environments, we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to detect the changes in the distribution of experiences, (2) develop a Deep Q Network (DQN) agent that is capable of recognizing diurnal patterns and making informed dispatching decisions according to the changes in the underlying environment. Rather than fixing patterns by time of week, the proposed approach automatically detects that the MDP has changed, and uses the results of the new model. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand aware vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, vehicle capacities, and locations. Evaluation on New York City Taxi public dataset shows the effectiveness of our approach in improving the fleet utilization, where less than 50% of the fleet are utilized to serve the demand of up to 90% of the requests, while maximizing profits and minimizing idle times.

Read full abstract

Policies For Markov Decision Processes Research Articles

Related Topics

Articles published on Policies For Markov Decision Processes

Learning to construct a solution for UAV path planning problem with positioning error correction

Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules

Sparse randomized policies for Markov decision processes based on Tsallis divergence regularization

Risk-Aware Markov Decision Process Contingency Management Autonomy for Uncrewed Aircraft Systems

Risk-Averse Decision Making Under Uncertainty

DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version

Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times

Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times

Entropy Rate Maximization of Markov Decision Processes for Surveillance Tasks⋆

Economic MPC of Markov Decision Processes: Dissipativity in undiscounted infinite-horizon optimal control

Online shielding for reinforcement learning

AdaPool: A Diurnal-Adaptive Fleet Management Framework Using Model-Free Deep Reinforcement Learning and Change Point Detection

A Reinforcement Learning Approach for Optimizing the Age-of-Computing-Enabled IoT

Efficient algorithms for Risk-Sensitive Markov Decision Processes with limited budget

Reducible Markov Decision Processes and Stochastic Games

Reinforcement Learning Based Hierarchical Multi-Agent Robotic Search Team in Uncertain Environment

Constrained Risk-Averse Markov Decision Processes

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Context-Aware Adaptive Route Mutation Scheme: A Reinforcement Learning Approach

Optimal Data Transfer of SEH-WSN Node via MDP Based on Duty Cycle and Battery Energy

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Policies For Markov Decision Processes Research Articles

Related Topics

Articles published on Policies For Markov Decision Processes

Learning to construct a solution for UAV path planning problem with positioning error correction

Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules

Sparse randomized policies for Markov decision processes based on Tsallis divergence regularization

Risk-Aware Markov Decision Process Contingency Management Autonomy for Uncrewed Aircraft Systems

Risk-Averse Decision Making Under Uncertainty

DSMC Evaluation Stages: Fostering Robust and Safe Behavior in Deep Reinforcement Learning – Extended Version

Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times

Asymptotic Optimality of Semi-Open-Loop Policies in Markov Decision Processes with Large Lead Times

Entropy Rate Maximization of Markov Decision Processes for Surveillance Tasks⋆

Economic MPC of Markov Decision Processes: Dissipativity in undiscounted infinite-horizon optimal control

Online shielding for reinforcement learning

AdaPool: A Diurnal-Adaptive Fleet Management Framework Using Model-Free Deep Reinforcement Learning and Change Point Detection

A Reinforcement Learning Approach for Optimizing the Age-of-Computing-Enabled IoT

Efficient algorithms for Risk-Sensitive Markov Decision Processes with limited budget

Reducible Markov Decision Processes and Stochastic Games

Reinforcement Learning Based Hierarchical Multi-Agent Robotic Search Team in Uncertain Environment

Constrained Risk-Averse Markov Decision Processes

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Context-Aware Adaptive Route Mutation Scheme: A Reinforcement Learning Approach

Optimal Data Transfer of SEH-WSN Node via MDP Based on Duty Cycle and Battery Energy