Abstract

In order to fully automate the environmental regulatory compliance checking process, we need to automatically extract the rules from applicable environmental regulatory textual documents, such as energy conservation codes. In our automated compliance checking (ACC) approach, prior to rule extraction, we first classify the text into pre-defined categories to only retrieve relevant clauses and filter out irrelevant ones, thereby improving the efficiency and accuracy of rule extraction. Machine learning (ML) techniques have been commonly-used for text classification (TC). ML-based TC has, generally, performed well. However, given the need for an exceptionally-high performance (100% recall and >85% precision) for ACC (to avoid consequent compliance reasoning errors), we need further performance improvement. Therefore, in this paper, we present an ontology-based TC algorithm to further improve the classification performance by utilizing the semantic features of the text. We used a domain ontology for conceptualizing the environmental knowledge. In comparison to the ML-based approach, in our ontology-based approach, a document (or clause) is represented in terms of semantic concepts and relations, rather than just terms (words). The semantic concepts and relations in the ontology (e.g. “is-a” relations) help in recognizing the semantic features of the text. Our ontology-based TC algorithm was tested on twelve environmental regulatory documents such as the 2012 International Energy Conservation Code, evaluated in terms of precision and recall, and compared with our previously-utilized ML-based approach. Our results show that our ontology-based approach achieves 96.62% and 96.34% recall and precision, respectively, thereby outperforming the ML-based approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call