Cloud coverage poses a prevalent challenge in optical remote sensing image processing, significantly hindering the visibility and interpretability of ground‐level information. The application of deep learning technology in remote sensing image cloud detection has witnessed a remarkable advancement, rendering it an increasingly pervasive and potent solution to the challenge of cloud coverage. To thoroughly investigate the effectiveness of diverse deep learning architectures in cloud detection tasks, this research selected five seminal models: AlexNet, VGG16, GoogLeNet, ResNet34, and the cutting‐edge Swin Transformer, and performed a comprehensive comparative analysis of their application in identifying clouds within medium‐resolution Landsat 8 satellite imagery. During the training phase, each model distinctly demonstrated unique strengths in enhancing accuracy, refining loss functions, optimizing resource utilization, and accelerating training processes. In the test phase, the models’ performance was rigorously evaluated using overall accuracy (OA) and other quantitative metrics, providing solid empirical evidence for the variations in their performance. Furthermore, an insightful analysis of the visualization outcomes and a meticulous comparison of the nuanced differences between model predictions and actual cloud images underscored the individual models’ exceptional abilities in capturing intricate cloud details and executing precise edge processing. To alleviate the cumbersome process of pixel‐level labeling, this study adopts the effective block‐level labeling approach. This approach drastically streamlines annotation processes, ultimately translating into substantial cost savings and a marked reduction in time required for data preparation. The experimental results demonstrate that the Swin Transformer network performs particularly well in the cloud detection task of Landsat 8 images, achieving an OA of 90.73% under block‐level annotation, significantly higher than other comparison models, showcasing excellent cloud detection capabilities. Furthermore, the study also found that the size of input image blocks has a significant impact on detection results. Under the same network architecture, using 32 × 32 image blocks for cloud detection improves the OA by at least 10 percentage points compared to using 64 × 64 image blocks. This finding provides important insights for optimizing cloud detection algorithms. In summary, this study not only verifies the superiority of Swin Transformer in cloud detection in remote sensing images but also reveals the impact of image block size on detection performance, providing new ideas and directions for future advancements in remote sensing image processing and cloud detection technology.
Read full abstract