Quantum Multiple Q-Learning

Michael Ganger,Wei Hu

doi:10.4236/ijis.2019.91001

Abstract

In this paper, a collection of value-based quantum reinforcement learning algorithms are introduced which use Grover’s algorithm to update the policy, which is stored as a superposition of qubits associated with each possible action, and their parameters are explored. These algorithms may be grouped in two classes, one class which uses value functions (V(s)) and new class which uses action value functions (Q(s,a)). The new (Q(s,a))-based quantum algorithms are found to converge faster than V(s)-based algorithms, and in general the quantum algorithms are found to converge in fewer iterations than their classical counterparts, netting larger returns during training. This is due to fact that the (Q(s,a)) algorithms are more precise than those based on V(s), meaning that updates are incorporated into the value function more efficiently. This effect is also enhanced by the observation that the Q(s,a)-based algorithms may be trained with higher learning rates. These algorithms are then extended by adding multiple value functions, which are observed to allow larger learning rates and have improved convergence properties in environments with stochastic rewards, the latter of which is further improved by the probabilistic nature of the quantum algorithms. Finally, the quantum algorithms were found to use less CPU time than their classical counterparts overall, meaning that their benefits may be realized even without a full quantum computer.

Highlights

This is the environment used by Brown [26], and is similar to grid environments used in recent studies of quantum computing-based reinforcement learning [23] [24]
Quantum Q Reinforcement Learning (QQRL) and VQRL only differ in how the value of the current state is stored; that is, QQRL uses Q (s, a) while VQRL only uses V ( s)
The reason for such improvement in convergence speed is due to the extra precision provided by Q ( s, a) when computing the value of a certain action and choosing the best action; in contrast, V ( s) is the expectation of the value of all possible actions, and as a consequence is much less precise

Summary

Introduction

Reinforcement learning algorithms are a subset of machine learning algorithms which find an optimal sequence of actions to achieve a goal; unlike supervised learning algorithms, reinforcement learning algorithms solve an implicit problem. Current quantum computers are relatively small, such as those produced by IBM Q [13] with 16 or 17 qubits, but the size of these computers is expected to increase over time as the technology matures. These future quantum computers would be able to solve certain problems which are intractable for classical computers. In this paper we only consider the application of Grover’s algorithm to reinforcement learning

Results

Discussion

Conclusion