Q Learning with Quantum Neural Networks

Wei Hu,James Hu

doi:10.4236/ns.2019.111005

Abstract

Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning: supervised, unsupervised, and reinforcement learning (RL). However, quantum RL has made the least progress when compared to the other two areas. In this study, we implement the well-known RL algorithm Q learning with a quantum neural network and evaluate it in the grid world environment. RL is learning through interactions with the environment, with the aim of discovering a strategy to maximize the expected cumulative rewards. Problems in RL bring in unique challenges to the study with their sequential nature of learning, potentially long delayed reward signals, and large or infinite size of state and action spaces. This study extends our previous work on solving the contextual bandit problem using a quantum neural network, where the reward signals are immediate after each action.

Highlights

Great success has been made in artificial intelligence, machine learning, deep learning, and reinforcement learning (RL)
The model based assumes that the agent has access to a model of the environment and learns from it while the model free assumes that the agent has no knowledge of a model and must learn from direct experience with the environment
The Q learning that we use in this study is a model free algorithm

Summary

Introduction

Great success has been made in artificial intelligence, machine learning, deep learning, and reinforcement learning (RL). Deep learning based on neural networks has demonstrated its power in supervised learning problems such as computer vision and machine translation. The goal of RL is to teach an agent to learn how to act from a given state in an unknown environment. Different from the one-step decisions in supervised learning, the sequential decision making character in RL is observed in the process of the agent taking an action, and receiving a reward and the state, and acting upon that state. The purpose of RL is for an agent to learn a strategy or policy that will obtain the maximum long-term cumulative rewards. A policy is a distribution over actions given the states, which the agent uses to determine the action based on the current state.

Methods

Results

Conclusion