Abstract

Reinforcement Learning is at the core of a recent revolution in Artificial Intelligence. Simultaneously, we are witnessing the emergence of a new field: Quantum Machine Learning. In the context of these two major developments, this work addresses the interplay between Quantum Computing and Reinforcement Learning. Learning by interaction is possible in the quantum setting using the concept of oraculization of environments. The paper extends previous oracular instances to address more general stochastic environments. In this setting, we developed a novel quantum algorithm for near-optimal decision-making based on the Reinforcement Learning paradigm known as Sparse Sampling. The proposed algorithm exhibits a quadratic speedup compared to its classical counterpart. To the best of the authors’ knowledge, this is the first quantum planning algorithm exhibiting a time complexity independent of the number of states of the environment, which makes it suitable for large state space environments, where planning is otherwise intractable.

Highlights

  • We take one step further and enforce the simulated environment to be fully quantized, a notion that first appeared in [6], [7], allowing a quantum agent to act in its environment according to the laws of quantum mechanics. Based on this interaction we prove that a quantum version of the sparse sampling algorithm produces near-optimal actions with quadratically less computational effort when compared to its classical counterpart

  • This demonstrates that the quantum algorithm proposed suggests an -optimal action to be taken in any initial state of a given Markov Decision Processes (MDP) with quadratically less computational effort compared with the original classical Sparse Sampling algorithm

  • The total number of queries performed by the classical algorithm is equal to how many times the condition presented in line 1 of the method EstimateQ() evaluates to True

Read more

Summary

INTRODUCTION

That prepares a linear combination between all possible transition states, weighted by the product of the state transition probabilities and the respective outcome states rewards This reasoning can be extended to allow for h interactions, i.e., sequences of h actions, by resorting to the quantum oracular environment O, as given by Equation (12); this is equivalent to compute a lookahead tree of depth h in superposition. O|ψ0 acts in the respective transition step sub-registers, preparing a superposition state |ψ in which the term with |r = |1 and with the highest amplitude represents the maximum expected reward. |ψi ← |ψi ⊗ |ai ⊗ |si+1 ; |ψi ← T (|si ⊗ |ai ⊗ |si+1 ); |ψi+1 ← Rs (|si+1 ⊗ |r ); i ← i + 1; end action ← QSearch(|ψh−1 ); A[action] ← A[action] + 1 ;

COMPLEXITY ANALYSIS
BOUNDING THE SEARCH SPACE
BOUNDING THE SAMPLE SIZE
NUMERICAL EXPERIMENTS AND RESULTS
STOCHASTIC GRIDWORLD
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.