Data-driven automatic classification model for construction accident cases using natural language processing with hyperparameter tuning

Louis Kumi,Jaewook Jeong,Jaemin Jeong

doi:10.1016/j.autcon.2024.105458

Abstract

The construction industry, while vital to societal progress, is marred by a high incidence of accidents and injuries. Manual classification of accident cases is intensive and susceptible to human bias. This study addresses this challenge by developing an automated accident case classification system for the construction industry using Natural Language Processing and machine learning techniques. This study was conducted using the following steps: (1) Establishment of dataset (2) Korean Natural Language Processing (3) Selection of machine learning models (4) Model evaluation. The models exhibited competitive performance, demonstrating high accuracy, precision, and recall rates across all classification tasks. XGBoost outperformed NB, SVM, and KNN for accident type, facility type, and work type with accuracy of 0.80, 0.56, and 0.67, respectively. The results also provided insights into the factors influencing accident classification. This study contributes to construction safety by providing a data-driven foundation for safety decision-making, resource allocation, and benchmarking.

Full Text