Reinforcement learning and model predictive control for robust embedded quadrotor guidance and control

Colin Greatwood,Arthur G Richards

doi:10.1007/s10514-019-09829-4

Colin Greatwood, Arthur G Richards

Open Access

https://doi.org/10.1007/s10514-019-09829-4

Copy DOI

Abstract

A new method for enabling a quadrotor micro air vehicle (MAV) to navigate unknown environments using reinforcement learning (RL) and model predictive control (MPC) is developed. An efficient implementation of MPC provides vehicle control and obstacle avoidance. RL is used to guide the MAV through complex environments where dead-end corridors may be encountered and backtracking is necessary. All of the presented algorithms were deployed on embedded hardware using automatic code generation from Simulink. Results are given for flight tests, demonstrating that the algorithms perform well with modest computing requirements and robust navigation.

Highlights

This paper introduces a method for navigation and control of quadrotors within a non-convex obstacle field
The same scenario from the previous section was flown, but this time with the addition of the reinforcement learning exploration algorithm presented in Sect
The effect of the reinforcement learning can be seen in Fig. 8b where the micro air vehicle (MAV) takes a seemingly direct route past the obstacles towards the goal, but gets stuck in a dead end, requiring it to turn around

Summary

Introduction

This paper introduces a method for navigation and control of quadrotors within a non-convex obstacle field. The method uses online optimization within a model predictive control (MPC) framework, taking advantage of Fast MPC (Wang and Boyd 2010) with soft constraint modifications (Richards 2015) to provide a real-time controller on embedded hardware. The use of reinforcement learning (RL) enables autonomous navigation by providing high level path planning decisions for navigation of previously unexplored spaces. Flight test experiments demonstrate the methods within a two dimensional control scenario. The experiments use off-board localization by motion capture and synthesized sensing of obstacles: these include important challenges, the focus here is on the decision-making. Trajectory generation in the presence of obstacles is NP-hard (Reif 1979) and has been the subject of considerable algorithm development, including randomized

Methods

Results

Conclusion