Abstract To address the issues of poor object localization, difficulty in object recognition, and inadequate segmentation performance of CNN-based semantic segmentation methods in landslide detection, a semantic segmentation model based on the Swin transformer and ResUNet architectures for landslide detection in remote sensing images is proposed in this paper. This method combines the global feature capabilities of the Swin transformer with the local feature extraction abilities of CNN and integrates our proposed RCCT module for landslide segmentation in remote sensing images. We apply this model to the identification and extraction of landslide disasters in Menyuan County and the Sanjiangyuan region of Qinghai Province, China, achieving a promising F1 Score of 81.91. Furthermore, we conducted a comprehensive evaluation of our proposed model on the BiJie Landslide dataset in comparison to four recently developed methods for landslide recognition. The experimental findings indicate the superior performance and effectiveness of our model over the comparative approaches.