ABSTRACT Due to the influence of dense distribution of detection objects and complex background, there are many small objects, which are difficult to detect in remote sensing images. In order to solve the difficult problem of small object detection in remote sensing images, we propose an object detection algorithm named CotYOLO-v3 in this paper. First, we redesign the residual blocks in the backbone Darknet-53, and we replace it with Contextual Transformer (Cot) blocks with contextual information in the backbone Darknet-53 to extract contextual information for small objects and enhance visual representation; Second, we introduce the shallow information with attention mechanism before the feature fusion of YOLO-v3 to reduce the influence of background interference factors and improve the expression ability of the network. Then, we optimize the feature fusion process, we replace the up-sampling method with sub-pixel convolution, and we replace the first convolution layer of the prediction branch with a residual block. Finally, we use K-Medians clustering algorithm to regenerate the anchors suitable for the remote sensing image datasets. In this paper, we set up a comparative experiment of CotYOLO-v3 and commonly used object detection algorithms to detect small objects in DIOR datasets. The experimental results show that, compared with other commonly used object detection algorithms, CotYOLO-v3 object detection algorithm has obvious advantages in detecting small objects in remote sensing images. Compared with the original object detection algorithm YOLO-v3, the mean Average Precision (mAP) of CotYOLO-v3 improved by 5.07%.
Read full abstract