Abstract

We consider the problem of vision-based detection and ranging of a target UAV using the video feed from a monocular camera onboard a pursuer UAV. Our previously published work in this area employed a cascade classifier algorithm to locate the target UAV, which was found to perform poorly in complex background scenes. We thus study the replacement of the cascade classifier algorithm with newer machine learning-based object detection algorithms. Five candidate algorithms are implemented and quantitatively tested in terms of their efficiency (measured as frames per second processing rate), accuracy (measured as the root mean squared error between ground truth and detected location), and consistency (measured as mean average precision) in a variety of flight patterns, backgrounds, and test conditions. Assigning relative weights of 20%, 40% and 40% to these three criteria, we find that when flying over a white background, the top three performers are YOLO v2 (76.73 out of 100), Faster RCNN v2 (63.65 out of 100), and Tiny YOLO (59.50 out of 100), while over a realistic background, the top three performers are Faster RCNN v2 (54.35 out of 100, SSD MobileNet v1 (51.68 out of 100) and SSD Inception v2 (50.72 out of 100), leading us to recommend Faster RCNN v2 as the recommended solution. We then provide a roadmap for further work in integrating the object detector into our vision-based UAV tracking system.

Highlights

  • Human beings can detect and track objects using their eyes

  • Accuracy is evaluated by taking the Root Mean Square (RMS) error between the location estimated by the object detection system and the ground truth location obtained from the Vicon motion-capture system along the side (x), height (y), and depth (z) directions

  • Consistency of an object detection system is measured using the mean Average Precision metric introduced in Section 2.3 and reflects the quality of the bounding box estimates provided by the Application Programming Interface (API)

Read more

Summary

Introduction

Human beings can detect and track objects using their eyes. The human brain can differentiate between various kinds of an object, e.g., different animal species. This article presents work performed on the implementation and benchmarking of various machine learning algorithms for the task of detection and ranging of a target UAV using the video feed from a monocular camera equipped onboard a pursuer UAV. While other studies have been published regarding testing and benchmarking of vision-based UAV detection and ranging, for instance [9,10,11], our study is unique in combining a broad choice of object detection algorithms (five candidates), having access to exact ground truth provided by an indoor motion capture system, and employing the commercial Parrot AR.Drone 2.0 UAV which brings about the challenges of its difficult-to-spot frontal profile due to its protective styrofoam hull and the low-resolution video from its onboard camera.

Background and Methodology
TensorFlow
Inception
MobileNet
Darknet
Detection Precision Metrics
Recall and Precision
Distance Estimation from a Monocular Camera
Camera Calibration
Experimental Testing Procedure
Overview
TensorFlow APIs Training
Darknet APIs Training
Object Detection Results
Running Speed
Accuracy
Offset in Vicon Camera System
Impact of Camera Calibration
Accuracy of Object Detection Systems
Consistency Results Discussion
Choice of Object Detection System
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.