Abstract

The trade-off between exploration and exploitation has been one of the main challenges for ensuring sampling efficiency, optimal solution, and transferability of reinforcement learning. Based on the Go-Explore framework, which is currently the most effective framework for the environments with sparse reward, latent go-explore (LGE) can overcome the complexity of manually designing state features. However, its state feature space is not effective enough for measuring the sampling density, and the exploration mode with a single state as a unit is inefficient. To this end, this paper proposes the LGE with the state area as a unit, named ALGE, which can encode the real environment distance into the state feature space and realize the exploration mode with the state area as a unit to further improve exploring efficiency. The proposed ALGE is verified by a series of experiments in multiple hard-exploration environments including a continuous-maze environment, a robot environment and two Atari environments. The experiment results demonstrate that the ALGE can respectively improve the explored space coverage rate by 12 % and 27 % in the continuous-maze and robot environments and outperformance the state-of-the-art algorithms of go-explore framework in terms of pure exploration in these hard-exploration environments including Montezuma's Revenge and Pitfall.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call