The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions.

Daniel L Elliott,Charles Anderson

doi:10.1109/tnnls.2021.3089425

Abstract

Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.

Full Text