Detecting green fruits presents significant challenges due to their close resemblance in color to the leaves in an orchard environment. We designed GreenFruitDetector, a lightweight model based on an improved YOLO v8 architecture, specifically for green fruit detection. In the Backbone network, we replace ordinary convolution with Deformable Convolution to enhance the extraction of geometric features. Additionally, we designed MCAG-DC (Multi-path Coordinate Attention Guided Deformer Convolution) to replace the convolution in C2f, enhancing the Backbone's feature extraction capability when encountering occlusion problems. For the Neck part of the algorithm, we designed a Fusion-neck structure that integrates spatial detail information from feature maps at different scales, thereby enhancing the network's ability to extract multi-scale information. Additionally, we devised a new detection head that incorporates multi-scale information, significantly improving the detection of small and distant objects. Finally, we applied channel pruning techniques to reduce the model size, parameter count, and FLOPs to 50%, 55%, and 44% of the original, respectively. We trained and evaluated the improved model on three green fruit datasets. The accuracy of the improved model reached 94.5%, 84.4%, and 85.9% on the Korla Pear, Guava, and Green Apple datasets, respectively, representing improvements of 1.17%, 1.1%, and 1.77% over the baseline model. The mAP@0.5 increased by 0.72%, 6.5%, and 0.9%, respectively, and the recall rate increased by 1.97%, 1.1%, and 0.49%, respectively.
Read full abstract