Abstract

In many decision-making problems, a rational reward function is required, which can correctly guide agents to make ideal operations. For example, an intelligent robot needs to check its power before sweeping. This kind of reward functions involves historical states, rather than a single current state. It is referred to as non-Markovian reward. However, state-of-the-art MDP (Markov Decision Process) planners only support Markovian reward. In this paper, we present an approach to transform non-Markovian reward expressed in $${LTL}_{f}$$ (Linear Temporal Logic over Finite Traces) into Markovian reward. $${LTL}_{f}$$ is converted into an automaton which is compiled to standard MDP model. Then the reward function of the model is further optimized through reward shaping in order to speed up planning. The reshaped reward function can be exploited by MDP planners to guide search and produce good training results. Finally, experiments with augmented International Probabilistic Planning Competition (IPPC) domain demonstrates the effectiveness and feasibility of our approach, especially the reshaped reward function can significantly improve the performance of planners.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call