Abstract

We present a novel formulation of closed-loop adaptive optics (AO) control as a multi-agent reinforcement learning (MARL) problem in which the controller is able to learn a non-linear policy and does not need a priori information on the dynamics of the atmosphere. We identify the different challenges of applying a reinforcement learning (RL) method to AO and, to solve them, propose the combination of model-free MARL for control with an autoencoder neural network to mitigate the effect of noise. Moreover, we extend current existing methods of error budget analysis to include a RL controller. The experimental results for an 8m telescope equipped with a 40x40 Shack-Hartmann system show a significant increase in performance over the integrator baseline and comparable performance to a model-based predictive approach, a linear quadratic Gaussian controller with perfect knowledge of atmospheric conditions. Finally, the error budget analysis provides evidence that the RL controller is partially compensating for bandwidth error and is helping to mitigate the propagation of aliasing.

Highlights

  • Closed-loop Adaptive Optics (AO) systems are a fundamental component of large telescopes to correct dynamically evolving wavefront aberrations introduced by the atmosphere in real-time

  • We present a novel formulation of closed-loop adaptive optics (AO) control as a multi-agent reinforcement learning (MARL) problem in which the controller is able to learn a non-linear policy and does not need a priori information on the dynamics of the atmosphere

  • We have identified the different challenges of designing a RL method for AO control and proposed a solution with a Multi-Agent model-free Reinforcement Learning controller working as an additive corrector on top of an integrator controller with an autoencoder to mitigate the impact of noise

Read more

Summary

Introduction

Closed-loop Adaptive Optics (AO) systems are a fundamental component of large telescopes to correct dynamically evolving wavefront aberrations introduced by the atmosphere in real-time. The classical AO controller relies on a linear relationship between measurements from the WFS and DM commands and an integrator for which a gain factor is used to mitigate errors introduced by e.g. intrinsic loop delay and temporal sampling. More advanced approaches have been developed under the form of model-based predictive controllers which predict future wavefront distortions and issue the appropriate commands to solve them. This is the case of linear quadratic Gaussian (LQG) controllers, first introduced in [1], which involve a linear dynamics model to describe the evolution of the system, a quadratic loss function to be optimised and Kalman filters to process noisy inputs. The integration of residual commands with previous commands applied on the DM, Ct−1, using a weight, the so-called gain g, mitigates the effects of this imperfect wavefront reconstruction and gives the final expression for a command at timestep t, Ct, for the integrator controller as seen in Eq (2)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.