Abstract
Noting the shortcomings of current methods in detecting small objects in image-based remote sensing applications, in this paper, we propose a novel implementation of single shot multibox detector (SSD) networks based on dilated convolution and feature fusion. We call this algorithm dilated convolution and feature fusion single shot multibox detector (DFSSD). This algorithm removes the random clipping steps of data preprocessing layers in conventional SSD networks and utilizes the structure of feature pyramid network (FPN) network to fuse the low-level feature map with high resolution and the high-level feature map with rich semantic information. It also enhances the receptive field of the third-level feature map of the DFSSD network by using dilated convolution. In the data processing step of the model, we use the image segmentation of the feature point region proposals to improve the training sample size. The mean average precision (mAP) value of the proposed DFSSD network, when tested on remote sensing datasets, achieves 76.51%, which is significantly higher than that of the SSD model (69.81%).
Highlights
Object detection has always been a research hotspot in the field of computer vision [1]
There exist many excellent object detection methods based on deep learning architectures and platforms such as AlexNet [7], ZFnet [8], VGGNet [9], GoogleNet [10], R-CNN [11], Faster R-CNN [12], SSD [13], and etc...Among them, the single shot multibox detector (SSD)model is a network architecture based on convolutional neural networks (CNN) with relatively high accuracy and near real-time performance
We call this method as dilated convolution and feature fusion single shot multibox detector (DFSSD), which improves the size of the receptive field of the feature layer, and increases the semantic information
Summary
Object detection has always been a research hotspot in the field of computer vision [1]. There exist many excellent object detection methods based on deep learning architectures and platforms such as AlexNet [7], ZFnet [8], VGGNet [9], GoogleNet [10], R-CNN [11], Faster R-CNN [12], SSD [13], and etc...Among them, the single shot multibox detector (SSD)model is a network architecture based on convolutional neural networks (CNN) with relatively high accuracy and near real-time performance. The framework augments a CNN with handcrafted features (instead of using DBN-based architecture) for classification This method achieves superior performance on sat-4 and sat-6 datasets with the accuracies of 99.90% and 99.84% respectively. Duarte et al [18] proposed three multi-resolution CNN feature fusion methods to improve the classification accuracy of building damage in the remote sensing images, reaching the accuracy of 88.7% on the satellite and aerial (unmanned) cases. The performance of [the proposed] DFSSD network on remote sensing datasets including700 aircraft and 938 car remote sensing images is not inferior to that of the same type of the networks, while the mAP is increased by 4%. compared with the original SSD network model
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have