Abstract
Nonlinear flight controllers for fixed-wing unmanned aerial vehicles (UAVs) can potentially be developed using deep reinforcement learning. However, there is often a reality gap between the simulation models used to train these controllers and the real world. This study experimentally investigated the application of deep reinforcement learning to the pitch control of a UAV in wind tunnel tests, with a particular focus of investigating the effect of time delays on flight controller performance. Multiple neural networks were trained in simulation with different assumed time delays and then wind tunnel tested. The neural networks trained with shorter delays tended to be susceptible to delay in the real tests and produce fluctuating behaviour. The neural networks trained with longer delays behaved more conservatively and did not produce oscillations but suffered steady state errors under some conditions due to unmodeled frictional effects. These results highlight the importance of performing physical experiments to validate controller performance and how the training approach used with reinforcement learning needs to be robust to reality gaps between simulation and the real world.
Highlights
Machine learning has become a prevalent approach to train controllers for a variety of applications in fields such as robotics, game playing and aviation
The deep reinforcement learning was completed offline in simulation, and the trained neural networks were used as an elevator controller for online closed-loop control
Namely NN0A, NN0B, NN100A, NN100B, NN200A, NN200B, NN300A and NN400A, were chosen and used in both simulations and experiments to produce the results presented
Summary
Machine learning has become a prevalent approach to train controllers for a variety of applications in fields such as robotics, game playing and aviation. Supervised learning is a common method of training, where neural networks learn from training data generated using baseline controllers. In the aviation domain, supervised learning has been proposed as an alternate decisionmaking method [1], with the lookup table for a collision avoidance system being replaced with a neural network, to increase the efficiency of the system. Nonlinear transformations for feedback linearisation have been represented by neural networks with [4] and without [5] recurrent architectures. These examples demonstrate the function approximation abilities of neural networks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.