Scene image text segmentation is an important task in computer vision, but the complexity and diversity of backgrounds make it challenging. All supervised image segmentation tasks require paired semantic label data to ensure the accuracy of segmentation, but semantic labels are often difficult to obtain. To solve this problem, we propose an unsupervised scene image text segmentation model based on the image style transfer model cyclic uniform Generation Adversarial network (CycleGAN), which is trained by partial unpaired label data. Text segmentation is achieved by converting a complex background to a simple background. Since the images generated by CycleGAN cannot retain the details of the text content, we also introduced the Atrous spatial Pyramid pool module (ASPP) to obtain the features of the text from multiple scales. The resulting image quality is improved. The proposed method is verified by experiments on a synthetic data set, the IIIT 5k word data set and the MACT data set, which effectively segments the text and preserves the details of the text content.