Abstract

A Dyna-Q algorithm is known as model-based reinforcement learning, so the learning agent not only interacts with the environment to learn an optimal policy, but also builds an environmental model simultaneously. To deal with the shortage of online samples, the environmental model is introduced to achieve the goal. To enhance the efficiency of the model, this paper proposes a model shaping method to compensate for bleak states scarcely visited during neighbor information. After acquiring an accurate model, many virtual experiences are sampled from this shaping model and indirect learning is thereby performed. However, how to use the model to speed up learning is an important issue. To increase the learning speed of the Dyna-Q algorithm based on the prioritized sweeping that can actually be regarded as a breadth-first search method, this paper introduces a depth-first search method that applies the techniques of ant colony algorithms to an exploration factor for selecting candidates in indirect learning. The strategy evolves to a hybrid planning approach by proportionally interleaving executions of depth-first planning and breadth-first planning. To verify the validity and applicability of the proposed method, simulations with a mountain car and maze problem are conducted. The simulation results show that the proposed method can achieve the objectives of sample efficiency and learning acceleration for the Dyna-Q learning algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call