Abstract

Literature on supervised Machine-Learning (ML) approaches for classifying text-based safety reports for the construction sector has been growing. Recent studies have emphasized the need to build ML approaches that balance high classification accuracy and performance on management criteria, such as resource intensiveness. However, despite being highly accurate, the extensively focused, supervised ML approaches may not perform well on management criteria as many factors contribute to their resource intensiveness. Alternatively, the potential for semi-supervised ML approaches to achieve balanced performance has rarely been explored in the construction safety literature. The current study contributes to the scarce knowledge on semi-supervised ML approaches by demonstrating the applicability of a state-of-the-art semi-supervised learning approach, i.e., Yet, Another Keyword Extractor (YAKE) integrated with Guided Latent Dirichlet Allocation (GLDA) for construction safety report classification. Construction-safety-specific knowledge is extracted as keywords through YAKE, relying on accessible literature with minimal manual intervention. Keywords from YAKE are then seeded in the GLDA model for the automatic classification of safety reports without requiring a large quantity of prelabeled datasets. The YAKE-GLDA classification performance (F1 score of 0.66) is superior to existing unsupervised methods for the benchmark data containing injury narratives from Occupational Health and Safety Administration (OSHA). The YAKE-GLDA approach is also applied to near-miss safety reports from a construction site. The study demonstrates a high degree of generality of the YAKE-GLDA approach through a moderately high F1 score of 0.86 for a few categories in the near-miss data. The current research demonstrates that, unlike the existing supervised approaches, the semi-supervised YAKE-GLDA approach can achieve a novel possibility of consistently achieving reasonably good classification performance across various construction-specific safety datasets yet being resource-efficient. Results from an objective comparative and sensitivity analysis contribute to much-required knowledge-contesting insights into the functioning and applicability of the YAKE-GLDA. The results from the current study will help construction organizations implement and optimize an efficient ML-based knowledge-mining strategy for domains beyond safety and across sites where the availability of a pre-labeled dataset is a significant limitation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call