Accurate detection and localization of fruits in natural environments is a key step for fruit picking robots to achieve precise harvesting. However, existing banana detection and positioning methods have two main limitations in practical applications: a large number of model parameters that make deployment difficult, and a need for performance improvement. To tackle the above issues, a high-precision and lightweight banana bunch recognition and localization method was proposed and deployed on edge devices for application. First, a Slim-Banana model was proposed based on the improvement of YOLOv8l. In order to reduce the model calculation amount and maintain high performance, GSConv was introduced in the Slim-Banana model to replace the standard convolution, and combined with grouped convolution and spatial convolution. At the same time, the cross-stage local network (GSCSP) module was designed to reduce the computational complexity and the complexity of the network structure through a single-stage aggregation method. Then, the RealSense depth sensor is combined with TOF technology to perform image registration and 3D localization of the banana. Finally, the pipeline is deployed on the Nvidia Orin NX edge device and its performance and resource consumption in actual work are deeply analyzed. Experimental results show that the detection precision, recall, mAP and inference time of our method are 0.947, 0.948, 0.98 and 113.6ms respectively, the network memory size required is 4449MiB, and the average localization errors in the X-axis, Y-axis and Z-axis directions are 13.47mm, 12.87mm and 13.87mm respectively. To our knowledge, this is the first work that implements banana detection and localization on edge devices. Experimental results show that compared with existing methods, our method achieves better performance in complex orchard environments, achieving efficient and lightweight banana recognition and localization.
Read full abstract