Abstract

Government inspection reports detail unsafe acts and conditions that arise on construction sites, especially front-line managers’ non-compliance practices, which are hardly identified during self-inspections. Such information serves as a valuable learning source for better construction management. However, non-compliance issue records in inspection reports are typically stored in unstructured text formats, making data analysis challenging. In response, an intelligent text mining framework integrating graph analysis and visualization is presented. The proposed framework comprises data collection and preprocessing and three levels of text analysis: word, sentence, and document. The main tasks of the word-level analysis include (1) extracting keywords using KeyBERT and (2) identifying non-compliance issue types based on community detection in a keyword co-occurrence graph. The sentence-level analysis is performed to automatically classify text data from inspection reports by determining the degree of similarity between texts and communities and assigning the most similar community to each text. The document-level analysis aims to identify the interrelations between various non-compliance issues through association rule mining and a community interaction network. The framework is validated by a total of 6,153 text data featuring non-compliance issues from 322 government on-site inspection reports in Shanghai, China. The results demonstrate that the critical word-level features of non-compliance issues can be accurately identified using KeyBert, which outperforms other state-of-the-art methods. Our approach can also automate the development of a data-driven taxonomy for non-compliance issues and the classification of the corresponding records, requiring less manual intervention than conventional text classification models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call