Abstract

AbstractAerial scene recognition (ASR) has attracted great attention due to its increasingly essential applications. Most of the ASR methods adopt the multi‐scale architecture because both global and local features play great roles in ASR. However, the existing multi‐scale methods neglect the effective interactions among different scales and various spatial locations when fusing global and local features, leading to a limited ability to deal with challenges of large‐scale variation and complex background in aerial scene images. In addition, existing methods may suffer from poor generalisations due to millions of to‐be‐learnt parameters and inconsistent predictions between global and local features. To tackle these problems, this study proposes a scale‐wise interaction fusion and knowledge distillation (SIF‐KD) network for learning robust and discriminative features with scale‐invariance and background‐independent information. The main highlights of this study include two aspects. On the one hand, a global‐local features collaborative learning scheme is devised for extracting scale‐invariance features so as to tackle the large‐scale variation problem in aerial scene images. Specifically, a plug‐and‐play multi‐scale context attention fusion module is proposed for collaboratively fusing the context information between global and local features. On the other hand, a scale‐wise knowledge distillation scheme is proposed to produce more consistent predictions by distilling the predictive distribution between different scales during training. Comprehensive experimental results show the proposed SIF‐KD network achieves the best overall accuracy with 99.68%, 98.74% and 95.47% on the UCM, AID and NWPU‐RESISC45 datasets, respectively, compared with state of the arts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call