Interpretable machine learning-based text classification method for construction quality defect reports

Zheng Wang,Cheng Wu,Yao Wang,Cheng Wang,Zhaoyun Zhang

doi:10.1016/j.jobe.2024.109330

Abstract

Efficient identification and remediation of construction defects are critical for ensuring the quality and success of engineering projects. However, the complexity of construction environments poses challenges to this endeavor. Current research predominantly relies on statistical and causal analyses of defect detection reports, yet these methods are time-consuming and error-prone due to the unstructured nature of such reports. To address this, machine learning techniques have been applied to classify defect texts rapidly and accurately. However, existing studies primarily focus on model performance enhancement, neglecting interpretability and the effect of imbalanced data. This study introduces RF-SMOTE, an oversampling technique based on Random Forest (RF), to address the limitations of traditional methods like SMOTE. Comparative analyses demonstrated the efficacy of RF-SMOTE in mitigating imbalanced data effects. Further, the application of SHAP-based interpretability methods in construction management decision-making was explored, filling gaps in existing research. Contributions include providing interpretable machine learning solutions, discussing the effect of imbalanced data, and proposing SHAP-based application scenarios.

Full Text