An Approach of Transforming Non-Markovian Reward to Markovian Reward

Ruixuan Miao,Jin Cui,Xu Lu

doi:10.1007/978-3-031-29476-1_2

Abstract

In many decision-making problems, a rational reward function is required, which can correctly guide agents to make ideal operations. For example, an intelligent robot needs to check its power before sweeping. This kind of reward functions involves historical states, rather than a single current state. It is referred to as non-Markovian reward. However, state-of-the-art MDP (Markov Decision Process) planners only support Markovian reward. In this paper, we present an approach to transform non-Markovian reward expressed in $${LTL}_{f}$$ (Linear Temporal Logic over Finite Traces) into Markovian reward. $${LTL}_{f}$$ is converted into an automaton which is compiled to standard MDP model. Then the reward function of the model is further optimized through reward shaping in order to speed up planning. The reshaped reward function can be exploited by MDP planners to guide search and produce good training results. Finally, experiments with augmented International Probabilistic Planning Competition (IPPC) domain demonstrates the effectiveness and feasibility of our approach, especially the reshaped reward function can significantly improve the performance of planners.

Full Text