The automatic harvesting of tomatoes has been achieved for many years in the laboratory. The new research topic is harvesting the tomato more flexibly and nondestructively at any tomato bunch pose according to the agronomic demands. Although the tomato pose can be predicted by keypoints detection, the poor data quality of commercial RGBD cameras, occlusion between plant organs, various tomato poses, and unstructured working environments pose some challenges to the tomato bunch pose detection. Therefore, our research proposed an improved version of the Tomato Pose Method (TPM), namely TPMv2, which is a two-stage end-to-end multi-task network. This network provides comprehensive information on the tomato bunch, including the positions and poses of the stem, peduncle, and fruits, by predicting the two-dimensional bounding box (2D BBox), three-dimensional bounding box (3D BBox), two-dimensional key point (2D Kpt), and three-dimensional key point (3D Kpt). Aiming at the problems of occlusion and poor-quality point cloud, this paper specially designs a key point network (KPN) for tomatoes, where a keypoints processing pipeline was innovatively proposed, improving the accuracy of key point positioning and reducing abnormal prediction effectively. TPMv2 makes it possible to detect tomato bunch pose precisely with an economical camera, avoiding dangerous situations caused by abnormal prediction. The precision of 2D BBox and 3D BBox reached 0.9372 and 0.8700, and the Percentage of correct Keypoints (PCK) of 2D Kpt and 3D Kpt reached 0.8882 and 0.7836. About 78.36 % of 3D Kpts' positioning errors are less than 20 mm, sufficient to describe a correct pose trend based on the 3D Kpt, benefiting the manipulator to plan a more reasonable trajectory for non-destructive harvesting.
Read full abstract