Entropy Maximization for Partially Observable Markov Decision Processes

Yagiz Savas,Ufuk Topcu,Michael Hibbard,Bo Wu,Takashi Tanaka

doi:10.1109/tac.2022.3183564

Abstract

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agent’s trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. Focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parameteric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In a numerical example, we highlight the benefit of using an entropy-maximizing FSC compared with an FSC that simply finds a feasible policy for accomplishing a task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Entropy Maximization for Partially Observable Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control

Lead the way for us

Journal: IEEE Transactions on Automatic Control	Publication Date: Dec 1, 2022
Citations: 1

Similar Papers

Unpredictable Planning Under Partial Observability
Michael Hibbard ... Bo Wu
-
Michael Hibbard, et. al.Michael Hibbard ... Bo Wu
01 Dec 2019
01 Dec 2019

Correct-by-construction policies for POMDPs
Nils Jansen ... Sebastian Junges
-
Nils Jansen, et. al.Nils Jansen ... Sebastian Junges
15 Apr 2019
15 Apr 2019

Generalized Controllers in POMDP Decision-Making
Kyle Hollins Wray ... Shlomo Zilberstein
-
Kyle Hollins Wray, et. al.Kyle Hollins Wray ... Shlomo Zilberstein
01 May 2019
01 May 2019

Safe Policy Improvement for POMDPs via Finite-State Controllers
Thiago D. Simão ... Nils Jansen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Thiago D. Simão, et. al.Thiago D. Simão ... Nils Jansen
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Entropy Maximization for Partially Observable Markov Decision Processes

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Automatic Control