Abstract

Reward signal reinforcement learning algorithms can be used to solve sequential learning problems. However, in practice, they still suffer from the problem of reward imbalance, which limits their use in many contexts. To solve this unbalanced reward problem, in this paper, we propose a novel model-based reinforcement learning algorithm called the expected n-step value iteration (EnVI). Unlike traditional model-based reinforcement learning algorithms, the proposed method uses a new return function that changes the discount of future rewards while reducing the influence of the current reward. We evaluated the performance of the proposed algorithm on a Treasure-Hunting game and a Hill-Walking game. The results demonstrate that the proposed algorithm can reduce the negative impact of unbalanced rewards and greatly improve the performance of traditional reinforcement learning algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call