Abstract

There are a large number of studies on geospatial object detection. However, many existing methods only focus on either accuracy or speed. Methods with both fast speed and high accuracy are of great importance in some scenes, like search and rescue, and military information acquisition. In remote sensing images, there are some targets that are small and have few textures and low contrast compared with the background, which impose challenges on object detection. In this paper, we propose an accurate and fast single shot detector (AF-SSD) for high spatial remote sensing imagery to solve these problems. Firstly, we design a lightweight backbone to reduce the number of trainable parameters of the network. In this lightweight backbone, we also use some wide and deep convolutional blocks to extract more semantic information and keep the high detection precision. Secondly, a novel encoding–decoding module is employed to detect small targets accurately. With up-sampling and summation operations, the encoding–decoding module can add strong high-level semantic information to low-level features. Thirdly, we design a cascade structure with spatial and channel attention modules for targets with low contrast (named low-contrast targets) and few textures (named few-texture targets). The spatial attention module can extract long-range features for few-texture targets. By weighting each channel of a feature map, the channel attention module can guide the network to concentrate on easily identifiable features for low-contrast and few-texture targets. The experimental results on the NWPU VHR-10 dataset show that our proposed AF-SSD achieves superior detection performance: parameters 5.7 M, mAP 88.7%, and 0.035 s per image on average on an NVIDIA GTX-1080Ti GPU.

Highlights

  • Nowadays, benefitting from the development of remote sensing technology, optical remote sensing images with high spatial resolution are obtained conveniently

  • We propose an accurate and fast single shot detector (AF-SSD) for high spatial remote sensing imagery, which concentrates on designing a lightweight backbone and extracting effective features for small, few-texture, and low-contrast targets

  • We introduce a spatial attention module to gain contextual contextual features for few-texture targets

Read more

Summary

Introduction

Nowadays, benefitting from the development of remote sensing technology, optical remote sensing images with high spatial resolution are obtained conveniently. Studies on analyzing and understanding remote sensing images have drawn wide attention in the last few years, which can be applied in searching, traffic planning, rescuing, and so on. Object detection methods based on deep neural networks [4,5,6,7,8,9], especially on convolutional neural networks (CNNs), have made great progress. Mainstream CNN-based object detection methods can be categorized into two classes: two-stage algorithms [5,10,11,12,13,14] and one-stage algorithms [4,6,15,16,17,18].

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call