Deterministic policies based on maximum regrets in MDPs with imprecise rewards

Pegah Alizadeh,Emiliano Traversi,Aomar Osmani

doi:10.3233/aic-190632

Abstract

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deterministic policies based on maximum regrets in MDPs with imprecise rewards

Abstract

Talk to us

Similar Papers

More From: AI Communications

Lead the way for us

Similar Papers

Delay-Optimal Dynamic Mode Selection and Resource Allocation in Device-to-Device Communications—Part I: Optimal Policy
Lei Lei ... Chuang Lin
IEEE Transactions on Vehicular Technology | VOL. 65
Lei Lei, et. al.Lei Lei ... Chuang Lin
01 May 2016
IEEE Transactions on Vehicular Technology | VOL. 65

Multi-objective deep reinforcement learning for optimal design of wind turbine blade
Zheng Wang ... Deyi Xue
Renewable Energy | VOL. 203
Zheng Wang, et. al.Zheng Wang ... Deyi Xue
02 Jan 2023
Renewable Energy | VOL. 203

Decentralized Deterministic Multi-Agent Reinforcement Learning
Antoine Grosnit ... Laura Wynter
-
Antoine Grosnit, et. al.Antoine Grosnit ... Laura Wynter
14 Dec 2021
14 Dec 2021

Deterministic and adaptive routing policies in packet-switched computer networks
Mario Gerla
-
Mario GerlaMario Gerla
01 Jan 1973
01 Jan 1973

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deterministic policies based on maximum regrets in MDPs with imprecise rewards

Abstract

Talk to us

Similar Papers

More From: AI Communications