Beet crops are highly vulnerable to pest infestations throughout their growth cycle, which significantly affects crop development and yield. Timely and accurate pest identification is crucial for implementing effective control measures. Current pest detection tasks face two primary challenges: first, pests frequently blend into their environment due to similar colors, making it difficult to capture distinguishing features in the field; second, pest images exhibit scale variations under different viewing angles, lighting conditions, and distances, which complicates the detection process. This study constructed the BeetPest dataset, a multi-scale pest dataset for beets in complex backgrounds, and proposed the SP-YOLO model, which is an improved real-time detection model based on YOLO11. The model integrates a CNN and transformer (CAT) into the backbone network to capture global features. The lightweight depthwise separable convolution block (DSCB) module is designed to extract multi-scale features and enlarge the receptive field. The neck utilizes the cross-layer path aggregation network (CLPAN) module, further merging low-level and high-level features. SP-YOLO effectively differentiates between the background and target, excelling in handling scale variations in pest images. In comparison with the original YOLO11 model, SP-YOLO shows a 4.9% improvement in mean average precision (mAP@50), a 9.9% increase in precision, and a 1.3% rise in average recall. Furthermore, SP-YOLO achieves a detection speed of 136 frames per second (FPS), meeting real-time pest detection requirements. The model demonstrates remarkable robustness on other pest datasets while maintaining a manageable parameter size and computational complexity suitable for edge devices.
Read full abstract