In the ongoing race towards experimental implementations of quantum error correction (QEC), finding ways to automatically discover codes and encoding strategies tailored to the qubit hardware platform is emerging as a critical problem. Reinforcement learning (RL) has been identified as a promising approach, but so far it has been severely restricted in terms of scalability. In this work, we significantly expand the power of RL approaches to QEC code discovery. Explicitly, we train an RL agent that automatically discovers both QEC codes and their encoding circuits for a given gate set, qubit connectivity and error model, from scratch. This is enabled by a reward based on the Knill-Laflamme conditions and a vectorized Clifford simulator, showing its effectiveness with up to 25 physical qubits and distance 5 codes, while presenting a roadmap to scale this approach to 100 qubits and distance 10 codes in the near future. We also introduce the concept of a noise-aware meta-agent, which learns to produce encoding strategies simultaneously for a range of noise models, thus leveraging transfer of insights between different situations. Our approach opens the door towards hardware-adapted accelerated discovery of QEC approaches across the full spectrum of quantum hardware platforms of interest.
Read full abstract