Abstract
We present a deep neural-net-based controller trained by a model-free reinforcement learning (RL) algorithm to achieve hover stabilization for a quadrotor unmanned aerial vehicle (UAV). With RL, two neural nets are trained. One neural net is used as a stochastic controller, which gives the distribution of control inputs. The other maps the UAV state to a scalar, which estimates the reward of the controller. A proximal policy optimization (PPO) method, which is an actor–critic policy gradient approach, is used to train the neural nets. Simulation results show that the trained controller achieves a comparable level of performance to a manually tuned proportional-derivative (PD) controller, despite not depending on any model information. The paper considers different choices of reward function and their influence on controller performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have