Transformers and deep CNNs for unmanned aerial vehicles detection

Zhangyang Qi,Jean-François Laplante,Moulay A Akhloufi,Paul L Muench,Hoa G Nguyen,Brian K Skibba

doi:10.1117/12.2622387

Abstract

Thanks to the maturity of their field, Unmanned Aerial Vehicles (UAVs) have a wide range of applications. Recently, we have witnessed an increase in the usage of multiple UAVs and UAV swarm due to their ability to achieve more complex tasks. Our goal is to use deep learning methods for object detection in order to detect and track a target drone in an image captured by another drone. In this work, we review four popular object detection categories: two-stage (anchor-based) methods, one-stage (anchor-based) methods, anchor free methods and Transformer-based methods. We compare these methods’ performance (COCO benchmark) and detection speed (FPS) for the task of real-time monocular 2D object detection between dual drones. We created a new dataset using footage from different scenes such as cities, villages, forests, highways, and factories. In our dataset, drone target bounding boxes are present at multiple scales. Our experiments show that anchor free and Transformer-based methods have the best performance. As for detection speed, the one-stage methods obtain the best results followed by and anchor free methods.

Full Text