Ontology-Based Multilabel Text Classification of Construction Regulatory Documents

Peng Zhou,Nora El-Gohary

doi:10.1061/(asce)cp.1943-5487.0000530

Abstract

AbstractIn order to fully automate the environmental regulatory compliance checking process, rules should be automatically extracted from applicable environmental regulatory textual documents, such as energy conservation codes. In the authors’ automated compliance checking (ACC) approach, prior to rule extraction, the text is first classified into predefined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly used for text classification (TC). Nonontology-based, ML-based TC has, generally, performed well. However, given the need for an exceptionally high performance in TC to support high performance in ACC, further TC performance improvement is needed. To address this need, an ontology-based TC algorithm is proposed to further improve the classification performance by utilizing the semantic features of the text. A domain ontology for conceptualizing the environment...

Full Text