Rapid and accurate detection of tender tea buds in the natural tea garden environment is the basis for intelligent tea picking. However, complex models result in high hardware computing power requirements and limit the deployment of tea bud recognition models in tea-picking robots. Therefore, this paper investigates a high-precision and lightweight target detection model based on improved you-only-look-once version 4(YOLOv4). The lightweight network GhostNet is used to replace the backbone of YOLOv4, and the depthwise separable convolution is designed to substitute the standard convolution, significantly reducing the computational load and computational complexity of the model. Additionally, the convolution block attention module (CBAM) is embedded into the path aggregation network (PANet), which enhances the model’s feature extraction capability. To solve the problem of detection and distinction caused by overlapping of one-bud-one-leaf and one-bud-two-leaf, this paper proposes the CIoU loss function in YOLOv4 to the SIoU loss function. The SIoU loss function considers the vector angles of the ground truth box and the prediction box and redefines the penalty indicator to improve the training speed and detection accuracy of the model. The experimental results show that the detection accuracy of the proposed approach is 85.15% for the one-bud-one-leaf and one-bud-two-leaf. The giga floating point operations per second (GFlops) and parameters are 6.594 G and 11.353 M. Relative to the original YOLOv4, the proposed algorithm’s mean accuracy is improved by 1.08%, 89.11% reduces the computational complexity, and 82.36% reduces the number of parameters. The Tea-YOLO algorithm demonstrates significantly better actual detection performance in different angles and natural environments compared to the YOLOv4 algorithm. The algorithm proposed in this paper can detect one-bud-one-leaf and one-bud-two-leaf quickly and accurately, which reduces the cost and difficulty of deploying the vision module of the tea-picking robot.