The rich context information and multiscale ground information in remote sensing images are crucial to improving the semantic segmentation accuracy. Therefore, we propose a remote sensing image semantic segmentation method that integrates multilevel spatial channel attention and multi-scale dilated convolution, effectively addressing the issue of poor segmentation performance of small target objects in remote sensing images. This method builds a multilevel characteristic fusion structure, combining deep-level semantic characteristics with the details of the shallow levels to generate multiscale feature diagrams. Then, we introduce the dilated convolution of the series combination in each layer of the atrous spatial pyramid pooling structure to reduce the loss of small target information. Finally, using convolutional conditional random field to describe the context information on the space and edges to improve the model’s ability to extract details. We prove the effectiveness of the model on the three public datasets. From the quantitative point of view, we mainly evaluate the four indicators of the model’s F1 score, overall accuracy (OA), Intersection over Union (IoU), and Mean Intersection over Union (MIoU). On GID dataset, F1 score, OA, and MIoU reach 87.27, 87.80, and 77.70, respectively, superior to most mainstream semantic segmentation networks.