Abstract
Vision Transformer (ViT) has been introduced in high-resolution synthetic aperture radar (HR SAR) image classification due to its excellent global feature extraction ability. However, small samples of SAR images make it difficult to fit the ViT with excessive trainable parameters, which easily results in over-fitting in training. Meanwhile, poor capability in capturing local features of ViT limits its accuracy in SAR image classification. To solve these problems, this letter proposes a new Lightweight Attention-Discarding Transformer (LAD Transformer) for the classification of HR SAR images. In the proposed model, the backbone of the advanced Swin transformer is used to model global information and extract hierarchical features. Moreover, the vital feature extraction part of the LAD Transformer completely discards the self-attention mechanism and extracts local features of SAR images by introducing lighter group convolution and channel shuffle (GC-CS Block). In addition, to address the estimation shift caused by consecutive batch normalization (BN) layers, a new composite normalization method consisting of Batch normalization and Layer Normalization (BLN) in GC-CS Block is proposed. The experiments show that the proposed network has fewer parameters and higher classification accuracy on two real HR SAR data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.