Despite advances in deep learning for plant leaf disease recognition, accurately distinguishing morphological features under varying environmental conditions continues to pose significant challenges. Traditional deep learning models often fail to effectively merge local and global information, especially in small-scale datasets, impairing performance and elevating training costs. Focusing on citrus diseases, we propose an improved FasterViT Model, an advanced hybrid CNN-ViT framework that builds upon the FasterViT model. The proposed model seamlessly integrates CNN's rapid local learning capabilities with ViT's global information processing strength, thereby effectively extracting complex textures and morphological features from images. Cross-stage alternating Mixup and Cutout methods are strategically employed to enhance model robustness and generalization capabilities, particularly valuable for fast learning on small-scale datasets by simulating a more diverse training environment. Triplet Attention and AdaptiveAvgPool mechanisms are utilized to reduce training costs and optimize training performance. The proposed model is tested on both our specially constructed small-scale citrus disease dataset called in-field small dataset and the comprehensive PlantVillage dataset. The experimental results demonstrated that the model exhibits the capability of fast learning and adaptation to small sample training in plant disease detection tasks, and demonstrates the effectiveness of our improvement approach in improving model accuracy and reducing training costs. Additionally, its exemplary performance in transfer learning scenarios underscores its adaptability and broad applicability. This study not only highlights the efficacy of the improved FasterViT model in addressing the complexities of plant disease image recognition but also pioneers a new paradigm for developing efficient, scalable, and robust classification systems.
Read full abstract