Fallen leaf disease can lead to a decrease in leaf area, a decrease in photosynthetic products, insufficient accumulation of fruit sugar, poor coloring and flavor, and a large number of fruits developing sunburn. To address the aforementioned issue, this article introduces a deep learning algorithm designed for the segmentation and recognition of agricultural disease images, particularly those involving leaf lesions. The essence of this algorithm lies in enhancing the Multi-scale Attention Net (MA-Net) encoder and attention mechanism to improve the model’s performance when processing agricultural disease images. Firstly, an analysis was conducted on MA-Net, and its limitations were identified. Compared to res-block, Mix Vision Transformer (MiT) consumes relatively less time during the training process, can better capture global and contextual information in images, and has better robustness and scalability. Then, the feature extraction parts of different networks were used as encoders to join the MA-Net network. Compared to a Position-wise Attention Block (PAB), which has higher computational complexity and requires a larger amount of computing resources, Effective Channel Attention net (ECANet) reduces the number of model parameters and computation by learning the correlation between channels, as well as having a better denoising ability. The experimental results show that the proposed solution has high accuracy and stability in agricultural disease image segmentation and recognition. The mean Intersection over Union (mIoU) is 98.1%, which is 0.2% higher than traditional MA-Net; Dice Loss is 0.9%, which is 0.1% lower than traditional MA-Net.