Unlike standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, which requires long-term memory and dependency. Hence solving a non-Markovian task is more difficult and challenging than solving a Markovian one. In this paper, we propose a novel RL approach for training non-Markovian tasks expressed in temporal logic LTLf (Linear Temporal Logic over Finite Traces). To this end, an encoding of linear complexity from LTLf into MDPs (Markov Decision Processes) is introduced in order to take advantage of advanced RL algorithms. We further propose an experience classification method based on the automaton structure (theoretically equivalent to LTLf specifications). An automatic reward shaping technique and a prioritized experience replay mechanism are developed to cooperate with the classification method to improve the performance of RL algorithms. We provide empirical evaluations on two widely used benchmark problems, Waterworld and Cartpole, both augmented with complex non-Markovian tasks. The evaluations are conducted with respect to the metrics of training speed, policy quality, convergence rates, computational efficiency and scalability etc. The experimental results show that our approach achieves superior performance over other relevant studies, specially with an average improvement of 133% in convergence rate and a reduction of 11% in training time.
Read full abstract