Detecting clouds, snow, and lakes in remote sensing images is vital due to their propensity to obscure underlying surface information and hinder data extraction. In this study, we utilize Sentinel-2 images to implement a two-stage random forest (RF) algorithm for image labeling and delve into the factors influencing neural network performance across six aspects: model architecture, encoder, learning rate adjustment strategy, loss function, input image size, and different band combinations. Our findings indicate the Feature Pyramid Network (FPN) achieved the highest MIoU of 87.14%. The multi-head self-attention mechanism was less effective compared to convolutional methods for feature extraction with small datasets. Incorporating residual connections into convolutional blocks notably enhanced performance. Additionally, employing false-color images (bands 12-3-2) yielded a 4.86% improvement in MIoU compared to true-color images (bands 4-3-2). Notably, variations in model architecture, encoder structure, and input band combination had a substantial impact on performance, with parameter variations resulting in MIoU differences exceeding 5%. These results provide a reference for high-precision segmentation of clouds, snow, and lakes and offer valuable insights for applying deep learning techniques to the high-precision extraction of information from remote sensing images, thereby advancing research in deep neural networks for semantic segmentation.
Read full abstract