In previous literature on predicting compressive strength (CS) using machine learning (ML), the focus has primarily been on algorithm-specific improvements, with less emphasis on improving the data perspective of CS prediction. Departing from these conventional investigations, this study presents data-centric enhancements. A systematic and comprehensive methodology is adopted, involving series of steps. Firstly, a large dataset (> 24,214 datapoints) comprising 26 input features relating to mixture proportions, engineered ratios and atmospheric parameters is prepared and processed. Feature engineering is then performed for selection and processing of data features. Subsequently, six scenarios are designed and investigated for CS prediction, focusing on distinct combinations of input features. Finally, CS predictions for all six scenarios are compared using five well-known ML model, and results are validated using statistical hypothesis testing. The results show significance improvements in CS prediction, with a 12.8 % increase in coefficient of determination (R2) and a 61 % reduction in root mean square error (RMSE) achieved by incorporating engineered and atmospheric parameters in conjunction with mixture proportion as inputs in CS prediction. Based on CS prediction in six scenarios, the study presents useful discussions focusing on concrete data perspective, role of engineering ratio and atmospheric features, and interaction and influence of input features in CS prediction. In conclusion, this work emphasizes the importance of enhancing data management practices for concrete performance assessment to have a more efficient, realistic, and sustainable outcomes in the concrete industry.
Read full abstract