Weeds have a significant impact on sesame throughout its early stages of development, thus they must be rigorously controlled. However, the shape of sesame seedlings and weeds are similar, and the size specifications are not defined, making reliable weed detection difficult. To achieve the goal of weed recognition, the majority of solutions now use a deep learning model to learn the weed image. Weed targets with big variances in size and specification are easy to overlook with the current deep learning algorithm. As a result, standard deep learning models have room for improvement when it comes to sesame and weed recognition rates. The YOLO-sesame model is proposed to improve the efficiency and accuracy of sesame weed identification. Based on the YOLOv4 model, an attention mechanism is introduced. Local importance pooling is added to the SPP layer, on which the SE module is used as a logical module. To address the issue of large differences in target size and specifications, an adaptive spatial feature fusion structure is included at the feature fusion level. The experimental results suggest that the YOLO-sesame model proposed in this study outperforms mainstream models such as Fast R-CNN, SSD, YOLOv3, YOLOv4, and YOLOv4-tiny in terms of detection performance. Sesame crops and weeds received F1 scores of 0.91 and 0.92, respectively, while the mAP was 96.16%. The detecting frame rate was 36.8 per second. In conclusion, the YOLO-sesame model successfully meets the needs for accurate sesame weed detection.