Longitudinal crack is a typical surface defect for slab of steel. Accurate prediction of longitudinal crack is of great significance to improve slab quality. However, in actual production, the quantity distribution of normal and longitudinally cracked slabs is extremely unbalanced, which brings great challenges to the subsequent modeling and prediction. To solve the above problems, this paper proposes a prediction method for the longitudinal crack under the condition of data imbalance. Firstly, multiple sampling methods (SMOTE, BorderlineSMOTE, SMOTE-ENN and SMOTE-Tomk) were used to construct feature data sets respectively to alleviate the problem of data imbalance. Then, based on the data sets after sampling processing, the prediction models of the longitudinal crack were constructed by using Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), XGBoost and LightGBM as classification algorithms. The models’ hyperparameters were determined by Bayesian optimisation and the optimal classification algorithm was selected according to the evaluation metrics. Meantime, SHapley Additive exPlanations (SHAP) was used to analyse the model and verify the influence of input parameters on the model output. The results show that, compared with other models, the combined model of SMOTE sampling and LightGBM classification algorithm can better deal with the problem of imbalanced data for prediction of the longitudinal crack. The Recall of normal and longitudinally cracked slabs are 90.57% and 84.62%, respectively, false alarm rate is 9.44% and AUC is 0.93. At the same time, the training time of this model is about 0.15 s, and prediction time is less than 0.01 s. The time consumed in each stage is significantly shorter than other models. It shows obvious advantages and provides a reliable method for predicting the longitudinal crack.
Read full abstract