Abstract

The accuracy and efficiency of scene classification have immensely improved with the extensive application of deep convolutional neural networks (CNNs). However, standard CNNs classify images mostly based on the global features from the last fully connected layer, which may cause the negligence of discriminative local information and the sensitivity to various spatial transformations. In this article, we consider the problem of scene classification from the perspective of multiple instance learning (MIL) and propose an end-to-end multiple instance CNN (MI-CNN) for learning more robust scene representations. In MI-CNN, a scene is represented as a bag of local patches (instances). An instance-level classifier is trained to obtain the label of each patch in an MIL fashion, which makes the classifier more sensitive to the discriminative local patches. The patch labels are then aggregated into an image label by an MIL pooling layer, which is invariant to the order of local patches and helps construct more robust representations. We present extensive experiments on UC Merced Land use (UCM), Aerial Image data set (AID), and NWPU-RESISC (NWPU) data sets. Experimental results show that the proposed method achieves 1.17%, 1.70%, and 3.61% accuracy improvements with 90% parameter reduction compared with the standard CNNs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.