Solar irradiance is the main factor affecting the output of a photovoltaic (PV) power station. The dominant role of ground-based clouds on the variation of direct solar radiation in the whole sky. Ground-based cloud segmentation plays a crucial role in photovoltaic power generation prediction. Despite advancements, current cloud segmentation methods fail to meet the requirements of this application. While convolutional neural networks demonstrate impressive segmentation capabilities in ground-based cloud segmentation tasks, their inherent limitations hinder further performance enhancement. Hence, this paper introduces CloudSwinNet, a hybrid CNN-Transformer framework for ground-based cloud images fine-grained segmentation. CloudSwinNet leverages concepts from convolutional neural networks, including stepwise downsampling, local convolution, and skip connections. To further explore the fine-grained features in the ground-based cloud images, we incorporates the Fine-grained Feature Fusion Module (Fg-FFM) in encoder structure, while the jump connection structure integrates the GloRe spatial graph inference module. These additions enable comprehensive learning of multi-scale and long-range dependencies. We conducted extensive experiments on the fine-grained ground-based cloud dataset, demonstrating that CloudSwinNet outperforms six semantic segmentation networks in both qualitative and quantitative evaluations. The results of ablation experiments prove the effectiveness of the module introduced in this paper.