Computer vision technology is widely used in smart agriculture, primarily because of its non-invasive nature, which avoids causing damage to delicate crops. Nevertheless, the deployment of computer vision algorithms on agricultural machinery with limited computing resources represents a significant challenge. Algorithm optimization with the aim of achieving an equilibrium between accuracy and computational power represents a pivotal research topic and is the core focus of our work. In this paper, we put forward a lightweight hybrid network, named VM-YOLO, for the purpose of detecting strawberry flowers. Firstly, a multi-branch architecture-based fast convolutional sampling module, designated as Light C2f, is proposed to replace the C2f module in the backbone of YOLOv8, in order to enhance the network’s capacity to perceive multi-scale features. Secondly, a state space model-based lightweight neck with a global sensitivity field, designated as VMambaNeck, is proposed to replace the original neck of YOLOv8. After the training and testing of the improved algorithm on a self-constructed strawberry flower dataset, a series of experiments is conducted to evaluate the performance of the model, including ablation experiments, multi-dataset comparative experiments, and comparative experiments against state-of-the-art algorithms. The results show that the VM-YOLO network exhibits superior performance in object detection tasks across diverse datasets compared to the baseline. Furthermore, the results also demonstrate that VM-YOLO has better performances in the mAP, inference speed, and the number of parameters compared to the YOLOv6, Faster R-CNN, FCOS, and RetinaNet.
Read full abstract