Abstract

A model-based reinforcement learning (RL) method which interplays direct and indirect learning to update ${Q}$ functions is proposed. The environment is approximated by a virtual model that can predict the transition to the next state and the reward of the domain. This virtual model is used to train ${Q}$ functions to accelerate policy learning. Lookup table methods are usually used to establish such environmental models, but these methods need to collect tremendous amounts of experiences to enumerate responses of the environment. In this paper, a stochastic model learning method based on tree structures is presented. To model the transition probability, an online clustering method is applied to equip the model learning method with the abilities to evaluate the transition probability. By the virtual model, the RL method produces simulated experience in the stage of indirect learning. Since simulated transitions and backups are more usefully focused by working backward from the state-action, the pair estimated ${Q}$ value of which changes significantly, the useful one-step backups are actions that lead directly into the one state whose value has already obviously been changed. This, however, may induce a false positive; that is, a backup state may be an invalid state, such as an absorbing or terminal state, especially in cases where the changes of ${Q}$ values at the planning stage are still needed to put back for ranking even though they are based on a simulated experience and are possibly erroneous. It is obvious that when the agent is attracted to generate simulated experience around the area of these absorbing states, the learning efficiency is deteriorated. This paper proposes three detecting methods to solve this problem. Moreover, the policy learning can speed up. The effectiveness and generality of our method is further demonstrated in three numerical simulations. The simulation results demonstrate that the training rate of our method is obviously improved.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.