Abstract

The remote sensing images in large scenes have a complex background, and the types, sizes, and postures of the targets are different, making object detection in remote sensing images difficult. To solve this problem, an end-to-end multi-size object detection method based on a dual attention mechanism is proposed in this paper. First, the MobileNets backbone network is used to extract multi-layer features of remote sensing images as the input of MFCA, a multi-size feature concentration attention module. MFCA employs an attention mechanism to suppress noise, enhance effective feature reuse, and improve the adaptability of the network to multi-size target features through multi-layer convolution operation. Then, TSDFF (two-stage deep feature fusion module)deeply fuses the feature maps output by MFCA to maximize the correlation between the feature sets and especially improve the feature expression of small targets. Next, the GLCNet (global-local context network) and the SSA (significant simple attention module) distinguish the fused features and screen out useful channel information, which makes the detected features more representative. Finally, the loss function is improved to truly reflect the difference between the candidate frames and the real frames, enhancing the network’s ability to predict complex samples. The performance of our proposed method is compared with other advanced algorithms on NWPU VHR-10, DOTA, RSOD open datasets. Experimental results show that our proposed method achieves the best AP (average precision) and mAP (mean average precision), indicating that the method can accurately detect multi-type, multi-size, and multi-posture targets with high adaptability.

Highlights

  • With the development of remote sensing satellites, unmanned aerial vehicles, and other technologies, the amount of remote sensing image data that can be obtained has exploded

  • 2) EXPERIMENTAL RESULTS ON THE DOTA DATASET To further evaluate the detection ability of our proposed method for multi-type, multi-size, and multi-posture targets in large-scale databases, experiments are conducted on the DOTA dataset

  • As for our proposed method, the resulting visual attention map of the target in the remote sensing image of the large scene with a complex background is closer to the ground truth

Read more

Summary

INTRODUCTION

With the development of remote sensing satellites, unmanned aerial vehicles, and other technologies, the amount of remote sensing image data that can be obtained has exploded. Wang et al.: Multi-Size Object Detection in Large Scene Remote Sensing Images Under Dual Attention Mechanism This may lead to more false-positive targets and increase the false alarm rate; Secondly, due to the dense distribution and small size of targets, as well as the different types, scales, and postures of the targets to be detected, many positive samples will not be detected, increasing the false-negative rate; Besides, the imaging quality of remote sensing images is not as good as that captured by digital cameras, and the resolution is low. This further increases the difficulty of object detection.

RELATED WORKS
TWO-STAGE DEEP FEATURE FUSION
GLOBAL-LOCAL CONTEXT NETWORK
SIGNIFICANT SIMPLE ATTENTION MODULE
LOSS FUNCTION DESIGN
EXPERIMENTAL SETTING AND PERFORMANCE EVALUATION INDEX
ABLATION EXPERIMENT
CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.