Abstract

For scaling up reinforcement learning to large and complex problems, we propose an approach to partition the larger state space into multiple smaller state spaces based the critical states for decomposing learning task. During learning process, we record every training episode, and eliminate the state loops existed in it. We find some states have high probability (even to 1) appeared in all these acyclic episodes. We call these states critical states. That means, if agent wants to reach the goal state, then it will have high probability to pass these critical states according to the learned experience. So the critical states can be used to partition the state space for accomplishing learning task by stages. We also prove that the optimal policy found in the partitioned smaller state space is equivalent to the optimal policy found in the original state space. The experiment comparisons between Q-learning and Q-learning with critical states demonstrate our approach more effective. The more important is that our approach brings the light of how agent can use its own experience to plan learning for better performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call