Mechanized harvesting is the key technology to solving the high cost and low efficiency of manual harvesting, and the key to realizing mechanized harvesting lies in the accurate and fast identification and localization of targets. In this paper, a lightweight YOLOv5s model is improved for efficiently identifying grape fruits and stems. On the one hand, it improves the CSP module in YOLOv5s using the Ghost module, reducing model parameters through ghost feature maps and cost-effective linear operations. On the other hand, it replaces traditional convolutions with deep convolutions to further reduce the model’s computational load. The model is trained on datasets under different environments (normal light, low light, strong light, noise) to enhance the model’s generalization and robustness. The model is applied to the recognition of grape fruits and stems, and the experimental results show that the overall accuracy, recall rate, mAP, and F1 score of the model are 96.8%, 97.7%, 98.6%, and 97.2% respectively. The average detection time on a GPU is 4.5 ms, with a frame rate of 221 FPS, and the weight size generated during training is 5.8 MB. Compared to the original YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x models under the specific orchard environment of a grape greenhouse, the proposed model improves accuracy by 1%, decreases the recall rate by 0.2%, increases the F1 score by 0.4%, and maintains the same mAP. In terms of weight size, it is reduced by 61.1% compared to the original model, and is only 1.8% and 5.5% of the Faster-RCNN and SSD models, respectively. The FPS is increased by 43.5% compared to the original model, and is 11.05 times and 8.84 times that of the Faster-RCNN and SSD models, respectively. On a CPU, the average detection time is 23.9 ms, with a frame rate of 41.9 FPS, representing a 31% improvement over the original model. The test results demonstrate that the lightweight-improved YOLOv5s model proposed in the study, while maintaining accuracy, significantly reduces the model size, enhances recognition speed, and can provide fast and accurate identification and localization for robotic harvesting.