Abstract In contemporary industrial systems, ensuring the quality of object surfaces has become an essential and inescapable aspect of factory inspections. Cascade regional convolutional neural network (cascade R-CNN), an object detection and instance segmentation algorithm based on deep learning, has been widely applied in numerous industrial applications. Nonetheless, there is still space for improving the detection of defects on metal surfaces. The cascade R-CNN does not currently have good results in metal defect detection, but after improving it by combining some newly proposed modules, cascade R-CNN has a good performance. This article proposes an enhanced metal defect detection method based on cascade R-CNN. Specifically, the improved backbone network is employed to acquire the features of images, which enables more precise localization. Additionally, up and down sampling is combined to extract multiscale defect feature maps, and contrast histogram equalization enhancement is utilized to tackle the issue of unclear contrast in the data. Experimental results demonstrate that the proposed approach achieves a mean average precision (mAP) of 0.754 on the NEU-DET dataset and outperforms the cascade R-CNN model by 9.2%.