Abstract
The development of robotic systems for apple picking is indeed a crucial advancement in agricultural technology, particularly in light of the ongoing labor shortages and the continuous evolution of automation technologies. Currently, during apple picking in complex environments, it is difficult to classify and identify the growth pattern of an apple and obtain information on its attitude synchronously. In this paper, through the integration of deep learning and stereo vision technology, the growth pattern and attitude of apples in the natural environment are identified, and three-dimensional spatial positioning is realized. This study proposes a fusion recognition method based on improved YOLOv7 for apple growth morphology classification and fruit position. Firstly, the multi-scale feature fusion network is improved by adding a 160 × 160 feature scale layer in the backbone network, which is used to enhance the model’s sensitivity in the recognition of very small local features. Secondly, the CBAM attention mechanism is introduced to improve the network’s attention to the target region of interest of the input image. Finally, the Soft-NMS algorithm is adopted, which can effectively prevent high-density overlapping targets from being suppressed at one time and thus prevent the occurrence of missed detection. In addition, the UNet segmentation network and the minimum outer circle and rectangle features are combined to obtain the unobstructed apple position. A depth image of the apple is obtained using an RGB-D camera, and the 3D coordinates of the apple picking point are obtained by combining the 2D coordinates in the RGB image with the depth value. The experimental results show that the recognition accuracy, recall and average recognition precision of the improved YOLOv7 are 86.9%, 80.5% and 87.1%, respectively, which are 4.2, 2.2 and 3.7 percentage points higher compared to the original YOLOv7 model; in addition, the average angular error of the apple position detection method is 3.964°, with an accuracy of 94%, and the error in the three-dimensional coordinate positioning of the apple is within the range of 0.01 mm–1.53 mm, which can meet the demands of apple-picking system operation. The deep-learning-based stereo vision system constructed herein for apple picking robots can effectively identify and locate apples and meet the vision system requirements for the automated picking task performed by an apple-picking robot, with a view to laying the foundation for lossless and efficient apple picking.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have