In machine learning, predicting the mechanical properties of stainless steel, such as Yield Strength (YS), Ultimate Tensile Strength (UTS), and Elongation (EL), requires many input variables, such as chemical composition, type of heat treatment, heating duration, and cooling method. However, the complexity and number of these variables can increase processing time and reduce model accuracy. This study aims to explore the impact of selecting the most influential input variables to improve prediction accuracy. We compared two feature selection techniques: Recursive Feature Elimination (RFE), which systematically removes less important features, and Information Gain (IG), which measures the contribution of each variable to the target prediction. Both techniques were implemented using the random forest algorithm, chosen for its robustness in handling large datasets and its ability to capture complex interactions between variables. Parameter optimization was performed using a grid search. The analysis showed that the RFE-based model outperformed both the IG-based model and the model without feature selection. In predicting YS, RFE identified 13 out of 21 influential variables, achieving a Mean Absolute Error (MAE) of 9.91, Root Mean Square Error (RMSE) of 14.20, and R-squared value of 0.89. For UTS, RFE identified 8 out of 21 variables, with an MAE of 12.89, RMSE of 16.97, and R-squared of 0.97. In predicting EL, RFE identified 14 out of 21 variables, with an MAE of 3.82, RMSE of 6.10, and an R-squared value of 0.85. The high R-squared values (0.85) across all properties indicate the model’s strong predictive capabilities, making it suitable for practical applications in predicting the mechanical properties of stainless steel.
Read full abstract