A Model-Based Exploration Policy in Deep Q-Network

Shuailong Li,Wei Zhang,Yuquan Leng,Xin Zhang

doi:10.1109/dsins54396.2021.9670573

Abstract

Reinforcement learning has successfully been used in many applications and achieved prodigious performance (such as video games), and DQN is a well-known algorithm in RL. However, there are some disadvantages in practical applications, and the exploration and exploitation dilemma is one of them. To solve this problem, common strategies about exploration like ɛ–greedy have risen. Unfortunately, there are sample inefficient and ineffective because of the uncertainty of later exploration. In this paper, we propose a model-based exploration method that learns the state transition model to explore. Using the training rules of machine learning, we can train the state transition model networks to improve exploration efficiency and sample efficiency. We compare our algorithm with ɛ–greedy on the Deep Q-Networks (DQN) algorithm and apply it to the Atari 2600 games. Our algorithm outperforms the decaying ɛ–greedy strategy when we evaluate our algorithm across 14 Atari games in the Arcade Learning Environment (ALE).

Full Text