Online Markov Decision Processes Configuration with Continuous Decision Space

Davide Maran,Pierriccardo Olivieri,Marcello Restelli,Francesco Emanuele Stradi,Giuseppe Urso,Nicola Gatti

doi:10.1609/aaai.v38i13.29344

Abstract

In this paper, we investigate the optimal online configuration of episodic Markov decision processes when the space of the possible configurations is continuous. Specifically, we study the interaction between a learner (referred to as the configurator) and an agent with a fixed, unknown policy, when the learner aims to minimize her losses by choosing transition functions in online fashion. The losses may be unrelated to the agent's rewards. This problem applies to many real-world scenarios where the learner seeks to manipulate the Markov decision process to her advantage. We study both deterministic and stochastic settings, where the losses are either fixed or sampled from an unknown probability distribution. We design two algorithms whose peculiarity is to rely on occupancy measures to explore with optimism the continuous space of transition functions, achieving constant regret in deterministic settings and sublinear regret in stochastic settings, respectively. Moreover, we prove that the regret bound is tight with respect to any constant factor in deterministic settings. Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Markov Decision Processes Configuration with Continuous Decision Space

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Active Inference for Stochastic Control
Aswin Paul ... Adeel Razi
-
Aswin Paul, et. al.Aswin Paul ... Adeel Razi
01 Jan 2020
01 Jan 2020

Problem-Specific Heuristics for Diagnosability and Inventory Analysis in a Reconfigurable Manufacturing System
Abdul Salam Khan
IEEE Access | VOL. 10
Abdul Salam KhanAbdul Salam Khan
01 Jan 2021
IEEE Access | VOL. 10

A SDRE-based asymptotic observer for nonlinear discrete-time systems
C Jaganath ... A Ridley
-
C Jaganath, et. al.C Jaganath ... A Ridley
08 Jun 2005
08 Jun 2005

Universal methods for variational inequalities: Deterministic and stochastic cases
Anton Klimza ... Mohammad Alkousa
Chaos, Solitons and Fractals: the interdisciplinary journal of Nonlinear Science, and Nonequilibrium and Complex Phenomena | VOL. 187
Anton Klimza, et. al.Anton Klimza ... Mohammad Alkousa
23 Aug 2024
Chaos, Solitons and Fractals: the interdisciplinary journal of Nonlinear Science, and Nonequilibrium and Complex Phenomena | VOL. 187

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Markov Decision Processes Configuration with Continuous Decision Space

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence