Abstract

We present a deep neural-net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller, which gives the distribution of control inputs. The other maps the UAV state to a scalar, which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor–critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually tuned proportional-derivative (PD) controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call