China is one of the countries with the largest citrus cultivation areas, and its citrus industry has received significant attention due to its substantial economic benefits. Traditional manual forestry surveys and remote sensing image classification tasks are labor-intensive and time-consuming, resulting in low efficiency. Remote sensing technology holds great potential for obtaining spatial information on citrus orchards on a large scale. This study proposes a lightweight model for citrus plantation extraction that combines the DeepLabV3+ model with the convolutional block attention module (CBAM) attention mechanism, with a focus on the phenological growth characteristics of citrus in the Guangxi region. The objective is to address issues such as inaccurate extraction of citrus edges in high-resolution images, misclassification and omissions caused by intra-class differences, as well as the large number of network parameters and long training time found in classical semantic segmentation models. To reduce parameter count and improve training speed, the MobileNetV2 lightweight network is used as a replacement for the Xception backbone network in DeepLabV3+. Additionally, the CBAM is introduced to extract citrus features more accurately and efficiently. Moreover, in consideration of the growth characteristics of citrus, this study augments the feature input with additional channels to better capture and utilize key phenological features of citrus, thereby enhancing the accuracy of citrus recognition. The results demonstrate that the improved DeepLabV3+ model exhibits high reliability in citrus recognition and extraction, achieving an overall accuracy (OA) of 96.23%, a mean pixel accuracy (mPA) of 83.79%, and a mean intersection over union (mIoU) of 85.40%. These metrics represent an improvement of 11.16%, 14.88%, and 14.98%, respectively, compared to the original DeepLabV3+ model. Furthermore, when compared to classical semantic segmentation models, such as UNet and PSPNet, the proposed model achieves higher recognition accuracy. Additionally, the improved DeepLabV3+ model demonstrates a significant reduction in both parameters and training time. Generalization experiments conducted in Nanning, Guangxi Province, further validate the model’s strong generalization capabilities. Overall, this study emphasizes extraction accuracy, reduction in parameter count, adherence to timeliness requirements, and facilitation of rapid and accurate extraction of citrus plantation areas, presenting promising application prospects.