Abstract

Vision Transformer (ViT) has been introduced in high-resolution synthetic aperture radar (HR SAR) image classification due to its excellent global feature extraction ability. However, small samples of SAR images make it difficult to fit the ViT with excessive trainable parameters, which easily results in over-fitting in training. Meanwhile, poor capability in capturing local features of ViT limits its accuracy in SAR image classification. To solve these problems, this letter proposes a new Lightweight Attention-Discarding Transformer (LAD Transformer) for the classification of HR SAR images. In the proposed model, the backbone of the advanced Swin transformer is used to model global information and extract hierarchical features. Moreover, the vital feature extraction part of the LAD Transformer completely discards the self-attention mechanism and extracts local features of SAR images by introducing lighter group convolution and channel shuffle (GC-CS Block). In addition, to address the estimation shift caused by consecutive batch normalization (BN) layers, a new composite normalization method consisting of Batch normalization and Layer Normalization (BLN) in GC-CS Block is proposed. The experiments show that the proposed network has fewer parameters and higher classification accuracy on two real HR SAR data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call