AbstractOcclusion is one of the key factors affecting the success rate of vision‐based fruit‐picking robots. It is important to accurately locate and grasp the occluded fruit in field applications, However, there is yet no universal and effective solution. In this paper, a high‐precision estimation method of spatial geometric features of occluded targets based on deep learning and multisource images is presented, enabling the selective harvest robot to envision the whole target fruit as if its occlusions do not exist. First, RGB, depth and infrared images are acquired. And pixel‐level matched RGB‐D‐I fusion images are obtained by image registration. Second, aiming at the problem of detecting the occluded tomatoes in the greenhouse, an extended Mask‐RCNN network is designed to extract the target tomato. The target segmentation accuracy is improved by 7.6%. Then, for partially occluded tomatoes, a shape and position restoration method is used to recover the obscured tomato. This algorithm can extract tomato radius and centroid coordinates directly from the restored depth image. The mean Intersection over Union is 0.895, and the centroid position error is 0.62 mm for the occluded rate under 25% and the illuminance between 1 and 12 KLux. And hereby a dual‐arm robotic harvesting system is improved to achieve a picking time of 11 s per fruit, an average gripping accuracy of 8.21 mm, and an average picking success rate of 73.04%. The proposed approach realizes a high‐fidelity geometrics reconstruction instead of mere image style restoration, which endows the robot with the ability to see through obstacles in the field scenes and improves its operational success rate in its result.