Abstract
Stochastic exploration is the key to the success of the deep Q -network (DQN) algorithm. However, most existing stochastic exploration approaches either explore actions heuristically regardless of their Q values or couple the sampling with Q values, which inevitably introduce bias into the learning process. In this article, we propose a novel preference-guided ϵ -greedy exploration algorithm that can efficiently facilitate exploration for DQN without introducing additional bias. Specifically, we design a dual architecture consisting of two branches, one of which is a copy of DQN, namely, the Q branch. The other branch, which we call the preference branch, learns the action preference that the DQN implicitly follows. We theoretically prove that the policy improvement theorem holds for the preference-guided ϵ -greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q values. Intuitively, the preference-guided ϵ -greedy exploration motivates the DQN agent to take diverse actions, so that actions with larger Q values can be sampled more frequently, and those with smaller Q values still have a chance to be explored, thus encouraging the exploration. We comprehensively evaluate the proposed method by benchmarking it with well-known DQN variants in nine different environments. Extensive results confirm the superiority of our proposed method in terms of performance and convergence speed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.