Land use classification remains a significant challenge in remote sensing semantic segmentation. While convolutional neural networks (CNNs) are widely used, their inherent limitations, such as restricted receptive fields, hinder their widespread application in remote sensing. Additionally, the scarcity of labeled remote sensing data and domain shift issues adversely impact deep learning model performance. This study proposes a hierarchical transfer learning framework for fine-category semantic segmentation tasks, leveraging the powerful global relationship modeling capabilities of Transformer models to classify land use in Dongpo District, Meishan City, in mainland China. Our framework represents multilevel transfer learning, progressing from non-remote sensing classification to coarse classification, then to the refined classification of remote sensing. We compared the performance of Transformer models with representative baseline CNNs like U-Net and DeepLab V3+. Results show that the Swin-Unet model outperforms the other models used in this study. It achieved the highest test mean intersection over union (MIoU) of 0.837 and 0.810 for residential and transportation in level 1 (coarse) classification, respectively, and 0.545 for irrigated land in level 2 (fine-grained) classification. Transfer learning from pre-trained models significantly enhanced semantic segmentation accuracy compared to random parameter initialization (ranging from 0.4% to 17.7%), with up to a 17.7% improvement in test MIoU for the public land category. The hierarchical transfer learning framework further improved segmentation accuracy for corresponding level 2 categories, leveraging pre-trained level 1 models. Our study shows the applicability of Transformer-based transfer learning in remote sensing land use classification.
Read full abstract