Green fruit detection is of great significance for estimating orchard yield and the allocation of water and fertilizer. However, due to the similar colors of green fruit and the background of images, the complexity of backgrounds and the difficulty in collecting green fruit datasets, there is currently no accurate and convenient green fruit detection method available for small datasets. The YOLO object detection model, a representative of the single-stage detection framework, has the advantages of a flexible structure, fast inference speed and excellent versatility. In this study, we proposed a model based on the improved YOLOv5 model that combined data augmentation methods to detect green fruit in a small dataset with a background of similar color. In the improved YOLOv5 model (YOLOv5-AT), a Conv-AT block and SA and CA blocks were designed to construct feature information from different perspectives and improve the accuracy by conveying local key information to the deeper layer. The proposed method was applied to green oranges, green tomatoes and green persimmons, and the mAPs were higher than those of other YOLO object detection models, reaching 84.6%, 98.0% and 85.1%, respectively. Furthermore, taking green oranges as an example, a mAP of 82.2% was obtained on the basis of retaining 50% of the original dataset (163 images), which was only 2.4% lower than that obtained when using 100% of the dataset (326 images) for training. Thus, the YOLOv5-AT model combined with data augmentation methods can effectively achieve accurate detection in small green fruit datasets under a similar color background. These research results could provide supportive data for improving the efficiency of agricultural production.