Joint learning of reward machines and policies in environments with partially known semantics

Christos K Verginis,Cevahir Koprulu,Sandeep Chinchali,Ufuk Topcu

doi:10.1016/j.artint.2024.104146

Abstract

We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions' truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that arrive from exploration of the environment. Additionally, the algorithm maintains a hypothesis reward machine, which acts as an estimate of the reward machine that encodes the task to be learned. As the agent explores the environment, the algorithm updates the hypothesis reward machine according to the obtained rewards and the estimate of the atomic propositions' truth value. Finally, the algorithm uses a Q-learning procedure for the states of the hypothesis reward machine to determine an optimal policy that accomplishes the task. We prove that the algorithm successfully infers the reward machine and asymptotically learns a policy that accomplishes the respective task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint learning of reward machines and policies in environments with partially known semantics

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence

Lead the way for us

Journal: Artificial Intelligence	Publication Date: May 23, 2024
License type: cc-by

Similar Papers

Approximations and logic.
Jean-Pierre Marquis
Notre Dame Journal of Formal Logic | VOL. 33
Jean-Pierre MarquisJean-Pierre Marquis
01 Mar 1992
Notre Dame Journal of Formal Logic | VOL. 33

Evidence theory in multivalued logic systems
Qing Zhu ... E Stanley Lee
International Journal of Intelligent Systems | VOL. 10
Qing Zhu, et. al.Qing Zhu ... E Stanley Lee
01 Jan 1995
International Journal of Intelligent Systems | VOL. 10

ARIES: A Tool For Inference Under Conditions Of Imprecision And Uncertainty
Lee Appelbaum ... Enrique H Ruspini
-
Lee Appelbaum, et. al.Lee Appelbaum ... Enrique H Ruspini
05 Apr 1985
05 Apr 1985

From Possibilistic Information to Kleene’s Strong Multi-Valued Logics
Gert De Cooman
-
Gert De CoomanGert De Cooman
01 Jan 1998
01 Jan 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint learning of reward machines and policies in environments with partially known semantics

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence