As one of the major global food crops, the monitoring and management of the winter wheat planting area is of great significance for agricultural production and food security worldwide. Today, the development of high-resolution remote sensing imaging technology has provided rich sources of data for extracting the visual planting information of winter wheat. However, the existing research mostly focuses on extracting the planting plots that have a simple terrain structure. In the face of diverse terrain features combining mountainous areas, plains, and saline alkali land, as well as small-scale but complex planting structures, the extraction of planting plots through remote sensing imaging is subjected to great challenges in terms of recognition accuracy and model complexity. In this paper, we propose a modified Segformer model for extracting winter wheat planting plots with complex structures in rural areas based on the 0.8 m high-resolution multispectral data obtained from the Gaofen-2 satellite, which significantly improves the extraction accuracy and efficiency under complex conditions. In the encoder and decoder of this method, new modules were developed for the purpose of optimizing the feature extraction and fusion process. Specifically, the improvement measures of the proposed method include: (1) The MixFFN module in the original Segformer model is replaced with the Multi-Scale Feature Fusion Fully-connected Network (MSF-FFN) module, which enhances the model’s representation ability in handling complex terrain features through multi-scale feature extraction and position embedding convolution; furthermore, the DropPath mechanism is introduced to reduce the possibility of overfitting while improving the model’s generalization ability. (2) In the decoder part, after fusing features at four different scales, a CoordAttention module is added, which can precisely locate important regions with enhanced features in the images by utilizing the coordinate attention mechanism, therefore further improving the model’s extraction accuracy. (3) The model’s input data are strengthened by incorporating multispectral indices, which are also conducive to the improvement of the overall extraction accuracy. The experimental results show that the accuracy rate of the modified Segformer model in extracting winter wheat planting plots is significantly increased compared to traditional segmentation models, with the mean Intersection over Union (mIOU) and mean Pixel Accuracy (mPA) reaching 89.88% and 94.67%, respectively (an increase of 1.93 and 1.23 percentage points, respectively, compared to the baseline model). Meanwhile, the parameter count and computational complexity are significantly reduced compared to other similar models. Furthermore, when multispectral indices are input into the model, the mIOU and mPA reach 90.97% and 95.16%, respectively (an increase of 3.02 and 1.72 percentage points, respectively, compared to the baseline model).