There are a large number of studies on geospatial object detection. However, many existing methods only focus on either accuracy or speed. Methods with both fast speed and high accuracy are of great importance in some scenes, like search and rescue, and military information acquisition. In remote sensing images, there are some targets that are small and have few textures and low contrast compared with the background, which impose challenges on object detection. In this paper, we propose an accurate and fast single shot detector (AF-SSD) for high spatial remote sensing imagery to solve these problems. Firstly, we design a lightweight backbone to reduce the number of trainable parameters of the network. In this lightweight backbone, we also use some wide and deep convolutional blocks to extract more semantic information and keep the high detection precision. Secondly, a novel encoding–decoding module is employed to detect small targets accurately. With up-sampling and summation operations, the encoding–decoding module can add strong high-level semantic information to low-level features. Thirdly, we design a cascade structure with spatial and channel attention modules for targets with low contrast (named low-contrast targets) and few textures (named few-texture targets). The spatial attention module can extract long-range features for few-texture targets. By weighting each channel of a feature map, the channel attention module can guide the network to concentrate on easily identifiable features for low-contrast and few-texture targets. The experimental results on the NWPU VHR-10 dataset show that our proposed AF-SSD achieves superior detection performance: parameters 5.7 M, mAP 88.7%, and 0.035 s per image on average on an NVIDIA GTX-1080Ti GPU.
Read full abstract