Abstract
We consider the problem of vision-based detection and ranging of a target UAV using the video feed from a monocular camera onboard a pursuer UAV. Our previously published work in this area employed a cascade classifier algorithm to locate the target UAV, which was found to perform poorly in complex background scenes. We thus study the replacement of the cascade classifier algorithm with newer machine learning-based object detection algorithms. Five candidate algorithms are implemented and quantitatively tested in terms of their efficiency (measured as frames per second processing rate), accuracy (measured as the root mean squared error between ground truth and detected location), and consistency (measured as mean average precision) in a variety of flight patterns, backgrounds, and test conditions. Assigning relative weights of 20%, 40% and 40% to these three criteria, we find that when flying over a white background, the top three performers are YOLO v2 (76.73 out of 100), Faster RCNN v2 (63.65 out of 100), and Tiny YOLO (59.50 out of 100), while over a realistic background, the top three performers are Faster RCNN v2 (54.35 out of 100, SSD MobileNet v1 (51.68 out of 100) and SSD Inception v2 (50.72 out of 100), leading us to recommend Faster RCNN v2 as the recommended solution. We then provide a roadmap for further work in integrating the object detector into our vision-based UAV tracking system.
Highlights
Human beings can detect and track objects using their eyes
Accuracy is evaluated by taking the Root Mean Square (RMS) error between the location estimated by the object detection system and the ground truth location obtained from the Vicon motion-capture system along the side (x), height (y), and depth (z) directions
Consistency of an object detection system is measured using the mean Average Precision metric introduced in Section 2.3 and reflects the quality of the bounding box estimates provided by the Application Programming Interface (API)
Summary
Human beings can detect and track objects using their eyes. The human brain can differentiate between various kinds of an object, e.g., different animal species. This article presents work performed on the implementation and benchmarking of various machine learning algorithms for the task of detection and ranging of a target UAV using the video feed from a monocular camera equipped onboard a pursuer UAV. While other studies have been published regarding testing and benchmarking of vision-based UAV detection and ranging, for instance [9,10,11], our study is unique in combining a broad choice of object detection algorithms (five candidates), having access to exact ground truth provided by an indoor motion capture system, and employing the commercial Parrot AR.Drone 2.0 UAV which brings about the challenges of its difficult-to-spot frontal profile due to its protective styrofoam hull and the low-resolution video from its onboard camera.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.