Abstract

The spatial distribution of remote-sensing scene images is highly complex in character, so how to extract local key semantic information and discriminative features is the key to making it possible to classify accurately. However, most of the existing convolutional neural network (CNN) models tend to have global feature representations and lose the shallow features. In addition, when the network is too deep, gradient disappearance and overfitting tend to occur. To solve these problems, a lightweight, multi-instance CNN model for remote sensing scene classification is proposed in this paper: MILRDA. In the instance extraction and classifier part, more discriminative features are extracted by the constructed residual dense attention block (RDAB) while retaining shallow features. Then, the extracted features are transformed into instance-level vectors and the local information associated with bag-level labels is highlighted by the proposed channel-attention-based multi-instance pooling, while suppressing the weights of useless objects or backgrounds. Finally, the network is constrained by the cross-entropy loss function to output the final prediction results. The experimental results on four public datasets show that our proposed method can achieve comparable results to other state-of-the-art methods. Moreover, the visualization of feature maps shows that MILRDA can find more effective features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call