Cloud image segmentation is a technique that divides images captured by meteorological satellites or ground-based observations into different regions or categories. By extracting the distribution, shape, and dynamic features of clouds, it provides precise data support for the meteorological and environmental fields, significantly influencing photovoltaic (PV) power generation forecasting, astronomical telescope observatory site selection, and weather forecasting. A ground-based cloud image segmentation model based on an improved U-Net is proposed, which adopts an overall encoder–decoder structure. In the encoder phase, this paper constructs a dilated convolution–atrous spatial pyramid pooling (ASPP)–dilated convolution structure to enhance early cloud feature extraction. Dilated convolution is a novel type of convolution that expands the receptive field by inserting holes into standard convolution, thereby capturing a larger range of contextual information. ASPP maintains high resolution while paying attention to both local details and global structures of the image. In the decoder stage, the bicubic interpolation method is used for up-sampling to restore the feature map resolution and improve the clarity of the segmented image. The bicubic interpolation method refers to the use of cubic polynomial functions to interpolate the pixel values of the input image. In addition, this paper designs a novel skip connection layer structure between the encoder and decoder, composed of a depthwise separable path (DS path) and an improved channel spatial attention module (Im-CSAM) connected in sequence. The DS path combines depthwise separable convolutions and residual structures to facilitate information exchange between high-level and low-level features. The Im-CSAM is a modular attention mechanism that focuses on important cloud features in spatial and channel dimensions to enhance segmentation accuracy. Experiments show that compared to the traditional U-Net, the accuracy, precision, and MIoU of this model improved by 2.2%, 4.1%, and 5.0%, respectively, in the SWINySEG dataset, and by 3.2%, 3.6%, and 5.8%, respectively, in the TCDD dataset, proving that the improved method has a better generalization ability and segmentation performance.
Read full abstract