Abstract
The rapid development of high spatial resolution (HSR) remote sensing imagery techniques not only provide a considerable amount of datasets for scene classification tasks but also request an appropriate scene classification choice when facing with finite labeled samples. AlexNet, as a relatively simple convolutional neural network (CNN) architecture, has obtained great success in scene classification tasks and has been proven to be an excellent foundational hierarchical and automatic scene classification technique. However, current HSR remote sensing imagery scene classification datasets always have the characteristics of small quantities and simple categories, where the limited annotated labeling samples easily cause non-convergence. For HSR remote sensing imagery, multi-scale information of the same scenes can represent the scene semantics to a certain extent but lacks an efficient fusion expression manner. Meanwhile, the current pre-trained AlexNet architecture lacks a kind of appropriate supervision for enhancing the performance of this model, which easily causes overfitting. In this paper, an improved pre-trained AlexNet architecture named pre-trained AlexNet-SPP-SS has been proposed, which incorporates the scale pooling—spatial pyramid pooling (SPP) and side supervision (SS) to improve the above two situations. Extensive experimental results conducted on the UC Merced dataset and the Google Image dataset of SIRI-WHU have demonstrated that the proposed pre-trained AlexNet-SPP-SS model is superior to the original AlexNet architecture as well as the traditional scene classification methods.
Highlights
With the recent launch of remote sensing satellites around the world, a large volume of multi-level, multi-angle, and multi-resolution high spatial resolution (HSR) remote sensing images can be obtained, where the remote sensing big data brings new understandings for the traditional definition of big data [1,2,3]
AlexNet architecture more transparent in dealing with the heterologous parameter transferring in quantity-limited HSR remote sensing imagery scene classification, the SS strategy is incorporated by introducing intermediate supervision to the layers of the pre-trained AlexNet architecture, to reduce the gradient vanishing phenomenon and prevent overfitting of the whole architecture
The pre-trained AlexNet architecture is an effective end-to-end HSR remote sensing imagery scene classification framework, but it only deals with the classification task with the final supervision term
Summary
With the recent launch of remote sensing satellites around the world, a large volume of multi-level, multi-angle, and multi-resolution HSR remote sensing images can be obtained, where the remote sensing big data brings new understandings for the traditional definition of big data [1,2,3]. In order to better deal with the multi-scale information of the convolved feature maps of the HSR remote sensing scene images and fuse this information, a multi-scale pooling strategy, named spatial pyramid pooling (SPP) [13,41,42,43], is incorporated into the pre-trained AlexNet classification architecture. The SPP strategy is incorporated into the end-to-end pre-trained AlexNet architecture, and solves the multi-scale scene interpretation task by fusing the different-scale convolved feature maps, which adequately considers the spatial information in different scales and increases the scene interpretation ability. AlexNet architecture more transparent in dealing with the heterologous parameter transferring in quantity-limited HSR remote sensing imagery scene classification, the SS strategy is incorporated by introducing intermediate supervision to the layers of the pre-trained AlexNet architecture, to reduce the gradient vanishing phenomenon and prevent overfitting of the whole architecture.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have