In the agricultural sector worldwide, there is a need for more human labor in fruit harvesting and processing operations, requiring advanced technologies like human–robot collaboration (HRC), where robots collaborate with humans to improve safety, efficiency, and productivity. In this context, we propose a new end-to-end vision-based detection, tracking, and classification (DTC) architecture enabling robots to assist human pickers in fruit-picking operations. The proposed solution includes using RGB-D camera perception, customized convolutional and recurrent neural network models, and pose estimation to detect, track, and classify the activities of human pickers from continuous streaming data captured in different seasons. It also provides a module for location estimation of the human pickers, allowing the robot to navigate to the estimated goal and assist them in real-time with logistic tasks during harvesting. The authors contribute a new video dataset of real fruit-picking operations under suboptimal conditions, which is used to evaluate the effectiveness of the proposed solution. Simulation results, obtained from the collected experimental dataset, show that the overall time required to compute insights-to-action is around six seconds for challenging scenarios such as multiple human pickers in the scene. This research aims to establish a baseline, end-to-end AI pipeline of insight predictions to prescriptive robot actions in fruit-picking operations to reduce workers’ working time and physical fatigue in logistics operations in the field. The authors expect this study to be used further to improve and develop anticipatory scheduling of robots methodology to save time and energy for pickers in logistics activities. The vision-based DTC architecture presents a reliable and cost-effective solution to address the scarcity of human labor in agriculture by facilitating the collaboration between humans and robots in fruit-picking tasks, as shown by simulations in the ROS-Gazebo framework.
Read full abstract