Abstract

We propose a reinforcement learning (RL) scheme for feedback quantum control within the quan-tum approximate optimization algorithm (QAOA). QAOA requires a variational minimization for states constructed by applying a sequence of unitary operators, depending on parameters living ina highly dimensional space. We reformulate such a minimum search as a learning task, where a RL agent chooses the control parameters for the unitaries, given partial information on the system. We show that our RL scheme finds a policy converging to the optimal adiabatic solution for QAOA found by Mbeng et al. arXiv:1906.08948 for the translationally invariant quantum Ising chain. In presence of disorder, we show that our RL scheme allows the training part to be performed on small samples, and transferred successfully on larger systems.

Highlights

  • Quantum optimization and control are at the leading edge of current research in quantum computation [1]

  • We show that strategies “learned” on small samples can be successfully transferred to larger systems, alleviating the “measurement problem”: one can learn a strategy on a small problem which can be simulated on a computer, and implement it on a larger experimental setup [39]

  • III, we report our results on the transverse field Ising model (TFIM) in one dimension, with periodic boundary conditions, where detailed quantum approximate optimization algorithm (QAOA) results are already known [14,40] and exact numerical results are obtained via the Jordan-Wigner transformation [41]

Read more

Summary

INTRODUCTION

Quantum optimization and control are at the leading edge of current research in quantum computation [1]. For quantum Ising chains, smooth regular optimal parameters can be found [14], which are adiabatic in a digitized-QA/AQC [22] context. One might regard QAOA as an optimal control process [23] in which one acts sequentially on the system in order to maximize a final reward. This reformulation seems suited for reinforcement learning (RL) [24,25,26,27]. We show that RL automatically learns smooth schedules, realizing an optimal controlled digitized-QA algorithm [14,38].

RL-ASSISTED QAOA
RESULTS
CONCLUSION AND OUTLOOK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call