Abstract

Object detection in remote sensing images (RSIs) plays a crucial role in aerial and satellite image analysis. Existing methods lack the capability to effectively detect small and multi-scale objects in RSIs. Consequently, achieving an optimal trade-off between speed and accuracy remains unattainable. Extensive investigation reveals that state-of-the-art detectors have largely overlooked two critical aspects: Spatial artifacts from convolution operations and gradient confusion caused by neighboring levels in the Feature Pyramid Network. To address the first problem, we propose adopting a non-reorganized patch-embedding layer in the downsampling stage and a dual-path learning network (DPLNet) as the backbone, which can effectively mitigate the adverse effects of the edge pixel feature bias in feature maps. Additionally, using DPLNet as the backbone network can minimize costs while learning the intrinsic feature information of objects in RSI. For the second aspect, we propose a neighbor-erasing module with only one gradient flow (OGF-NEM). This module utilizes deep features to erase large objects to highlight small objects in shallow features and changes the backpropagation path to prevent the backflow of unreasonable gradients and the erosion of information from neighbor scales. Thus, a novel detector, called SDSDet, is proposed, which achieves excellent performance for small, dense, and multi-scale objects in RSIs. We have conducted exhaustive experiments on DOTA and MS COCO datasets. Specifically, the SDSDet achieves 42.8% AP on DOTA and 33.3% AP on MS COCO, together with nearly 4.87 M model size and 95 FPS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call