Detecting grape stems is essential for the autonomous operation of grape-picking robots. In natural orchards, the complex and irregular positioning of grape bunches, along with frequent occlusions caused by obstacles and leaves, create a dynamic and unpredictable environment that substantially affects the robot’s perception quality and harvesting efficiency. Inspired by human active observation mechanisms, this study introduces an end-to-end active visual perception framework using deep reinforcement learning (DRL) to enhance the detection of grape stems under complex occlusion. An instance segmentation network is trained to obtain 2D masks, which are integrated into an octree volumetric grid to produce detailed volumetric observations for DRL training. Different occupancy weights are assigned to both explored and newly discovered regions, contributing to a novel reward function based on information gain, which subsequently drives the network to optimize camera movements, guiding the robotic arm towards the best viewpoint for efficient grape stem detection. The superior performance of the proposed method has been validated in both laboratory and outdoor orchards settings. The primary contribution of this work lies in presenting a novel fully end-to-end detection framework for occluded grape stems. Compared to existing methods, it enables the system to directly learn optimal viewpoint strategies through interactions between the robotic arm and the environment, effectively handling feature extraction and environmental modeling without the need for manually designed information gain metrics. This advancement provides foundational support for the next generation of highly autonomous picking robots, capable of operating adaptively in complex, unstructured environments.
Read full abstract