In agricultural production, rapid and accurate detection of peach blossom bloom plays a crucial role in yield prediction, and is the foundation for automatic thinning. The currently available manual operation-based detection and counting methods are extremely time-consuming and labor-intensive, and are prone to human error. In response to the above issues, this paper proposes a natural environment peach blossom detection model based on the YOLOv5 model. First, a cascaded network is used to add an output layer specifically for small target detection on the basis of the original three output layers. Second, a combined context extraction module (CAM) and feature refinement module (FSM) are added. Finally, the network clusters and statistically analyzes the range of multi-scale channel elements using the K-means++ algorithm, obtaining candidate box sizes that are suitable for the dataset. A novel bounding box regression loss function (SIoU) is used to fuse the directional information between the real box and the predicted box to improve detection accuracy. The experimental results show that, compared with the original YOLOv5s model, our model has correspondingly improved AP values for identifying three different peach blossom shapes, namely, bud, flower, and falling flower, by 7.8%, 10.1%, and 3.4%, respectively, while the final mAP value for peach blossom recognition increases by 7.1%. Good results are achieved in the detection of peach blossom flowering volume. The proposed model provides an effective method for obtaining more intuitive and accurate data sources during the process of peach yield prediction, and lays a theoretical foundation for the development of thinning robots.