Surface-defect inspection is vital in cold-rolled steel-strip manufacturing, given the complexities of production environments and the high speeds involved. Further, the defects on cold-rolled steel strips are often characterized by their small size, diversity of types, and similarities among different types, posing significant challenges in balancing detection accuracy and efficiency. To address the challenges, we designed a detector based on You Only Look Once version 5 (YOLOv5) to achieve precise detection of surface defects on cold-rolled steel strips. First, a dataset containing seven types of defects was curated, named the Cold-Rolled Steel Defect Dataset (CR7-DET). Next, a feature-extraction network based on residual-like connections within a single residual block (Res2net) was developed to enhance the model’s feature-extraction capability, alongside introducing a multi-head attention module to focus on key information features. To reduce the information loss during feature fusion, we established an adaptive feature-fusion Path Aggregation Network (aff-PAN), which was optimized by designing a lightweight adaptive down-sampling module (LAD) to increase the sensory-field implementation of feature fusion. The ghost convolution effectively reduced the number of parameters and increased the speed without affecting the model’s performance. Finally, experiments were conducted on our CR7-DET and a public dataset (GC10-DET). With a reduced parameter count of 6.85 million, our model achieved a mean average precision(mAP) of 87.6% on CR7-DET and 79.7% on GC10-DET. The experimental results demonstrated that our model achieved a balance between detection accuracy and inference efficiency. The model has the potential to reduce scrap rates caused by defects and improve the overall surface quality of cold-rolled steel strips.