The core challenge in realizing automatic tomato harvesting in greenhouse environments lies in the precise identification and localization of the fruits. This paper introduces a comprehensive approach based on an improved YOLOv5 detection algorithm and optimized binocular stereo vision technology. Firstly, by introducing the C3-Transformer Encoder (CTM) structure and Bidirectional Feature Pyramid Network (Bi-FPN), this study enhanced the model’s ability to recognize tomatoes, especially under complex backgrounds and occlusion conditions. After field testing, the mAP50 accuracy reached 97.1%, an increase of 1.2 percentage points, enhancing detection precision. In addition, the ZED binocular camera was used, and the census stereo matching algorithm was optimized, significantly reducing disparity errors, thereby improving the accuracy of depth information. This allows the model to accurately calculate the three-dimensional spatial position of tomatoes obscured by branches and leaves, greatly improving the efficiency of the harvesting robot. Through field debugging verification with the harvesting robot, the method proposed in this study has shown high accuracy and reliability in the recognition and localization of tomatoes in complex greenhouse environments.
Read full abstract