Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Riccardo Polvara,Gerhard Neumann,Massimiliano Patacchiola,Marc Hanheide

doi:10.3390/robotics9010008

Abstract

The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions.

Highlights

IntroductionAn increasing number of autonomous systems will pervade urban and domestic environments
In the upcoming years, an increasing number of autonomous systems will pervade urban and domestic environments
For the marker alignment phase, Sequential Deep Q-Network (SDQN)-domain randomization (DR) has an accuracy of 91%, while SDQN obtains a lower score (39%)

Summary

Introduction

An increasing number of autonomous systems will pervade urban and domestic environments. The generation of Unmanned Aerial Vehicles (UAVs) will require high-level controllers to move in unstructured environments and perform multiple tasks, such as the delivery of packages and goods. In this scenario, it is necessary to use robust control policies for landing pad identification and vertical descent. Existing work in the literature is mainly based on the extraction of the geometric visual feature with the aid of external sensors for the landing pad identification and vertical descent. We propose a new approach, which is based on recent breakthroughs achieved with differentiable neural policies in the context of Deep Reinforcement.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Robotics	Publication Date: Feb 25, 2020
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Robotics

Lead the way for us

Similar Papers

Deep Reinforcement Learning for Dynamic Spectrum Access in Wireless Networks
Y Xu ... J Yu
-
Y Xu, et. al.Y Xu ... J Yu
01 Oct 2018
01 Oct 2018

Toward End-to-End Control for UAV Autonomous Landing via Deep Reinforcement Learning
Riccardo Polvara ... Jian Wan
-
Riccardo Polvara, et. al.Riccardo Polvara ... Jian Wan
01 Jun 2018
01 Jun 2018

Trajectory Planning for UAV-Assisted Data Collection in IoT Network: A Double Deep Q Network Approach
Shuqi Wang ... Hua Jiang
Electronics | VOL. 13
Shuqi Wang, et. al.Shuqi Wang ... Hua Jiang
22 Apr 2024
Electronics | VOL. 13

Dynamic Multi-channel Access in Wireless System with Deep Reinforcement Learning
Fan Li ... Yun Zhu
-
Fan Li, et. al.Fan Li ... Yun Zhu
01 Aug 2020
01 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Robotics