Aiming at the problem of low accuracy of multi-scale seafloor target detection in side-scan sonar images with high noise and complex background texture, a model for multi-scale target detection using the BES-YOLO network is proposed. First, an efficient multi-scale attention (EMA) mechanism is used in the backbone of the YOLOv8 network, and a bi-directional feature pyramid network (Bifpn) is introduced to merge the information of different scales, finally, a Shape_IoU loss function is introduced to continuously optimize the model and improve its accuracy. Before training, the dataset is preprocessed using 2D discrete wavelet decomposition and reconstruction to enhance the robustness of the network. The experimental results show that 92.4% of the mean average accuracy at IoU of 0.5 (mAP@0.5) and 67.7% of the mean average accuracy at IoU of 0.5 to 0.95 (mAP@0.5:0.95) are achieved using the BES-YOLO network, which is an increase of 5.3% and 4.4% compared to the YOLOv8n model. The research results can effectively improve the detection accuracy and efficiency of multi-scale targets in side-scan sonar images, which can be applied to AUVs and other underwater platforms to implement intelligent detection of undersea targets.