Abstract

Detection of small targets in aerial images is still a difficult problem due to the low resolution and background-like targets. With the recent development of object detection technology, efficient and high-performance detector techniques have been developed. Among them, the YOLO series is a representative method of object detection that is light and has good performance. In this paper, we propose a method to improve the performance of small target detection in aerial images by modifying YOLOv5. The backbone is was modified by applying the first efficient channel attention module, and the channel attention pyramid method was proposed. We propose an efficient channel attention pyramid YOLO (ECAP-YOLO). Second, in order to optimize the detection of small objects, we eliminated the module for detecting large objects and added a detect layer to find smaller objects, reducing the computing power used for detecting small targets and improving the detection rate. Finally, we use transposed convolution instead of upsampling. Comparing the method proposed in this paper to the original YOLOv5, the performance improvement for the mAP was 6.9% when using the VEDAI dataset, 5.4% when detecting small cars in the xView dataset, 2.7% when detecting small vehicle and small ship classes from the DOTA dataset, and approximately 2.4% when finding small cars in the Arirang dataset.

Highlights

  • Research on the field of deep learning-based object detection has been steadily conducted [1]

  • The ECAP-YOLO and ECAPs-YOLO proposed in this paper showed a 6.9% improvement in the mean average precision in the VEDAI dataset

  • The authors of this paper presented the idea of squeeze and excitation, and the Squeeze-and-Excitation Network (SE-Net) consists of a squeeze operation that summarizes the entire information of each feature map and an excitation operation that scales the importance of each feature

Read more

Summary

Introduction

Research on the field of deep learning-based object detection has been steadily conducted [1]. In the case of a one-stage detector, localization to determine the location of an object and classification to identify an object are simultaneously performed, and two-stage object detection [2,3,4] is a method that performs these two sequentially. In this respect, one-stage object detection has the advantage of being faster than two-stage object detection. There are various deep learning networks, such as SSD [5], EfficientDet [6], and YOLO series [7,8,9,10], for one-stage object detection

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call