Remote sensing (RS) images contain a wealth of information with expansive potential for applications in image segmentation. However, Convolutional Neural Networks (CNN) face challenges in fully harnessing the global contextual information. Leveraging the formidable capabilities of global information modeling with Swin-Transformer, a novel RS images segmentation model with CNN (GLE-Net) was introduced. This integration gives rise to a revamped encoder structure. The subbranch initiates the process by extracting features at varying scales within the RS images using the Multiscale Feature Fusion Module (MFM), acquiring rich semantic information, discerning localized finer features, and adeptly handling occlusions. Subsequently, Feature Compression Module (FCM) is introduced in main branch to downsize the feature map, effectively reducing information loss while preserving finer details, enhancing segmentation accuracy for smaller targets. Finally, we integrate local features and global features through Spatial Information Enhancement Module (SIEM) for comprehensive feature modeling, augmenting the segmentation capabilities of model. We performed experiments on public datasets provided by ISPRS, yielding notably remarkable experimental outcomes. This underscores the substantial potential of our model in the realm of RS image segmentation within the context of scientific research.