Abstract

Timely and accurate identification of engineering change documents plays an important role in the document management system of the Architecture, Engineering, and Construction (AEC) industry, which facilitates decision making. The current way of manual review and analysis may cause a delay or omission in information transmission, which may severely affect the project schedule and cost. This paper adopts text classification to explore the intelligent identification of unstructured engineering change documents. However, the paucity of available corpus and the limited number of engineering change texts are challenges in practice. Additionally, the semantic information of texts cannot be elegantly exploited. Especially there are unregistered words that interfere with semantics in engineering change texts. To tackle these problems, we propose a compositional semantic representation (CSR) and develop an SVM-based method named CSR-SVM. We introduce a language model to produce word embeddings; here, a domain dictionary is established for unregistered words. The embeddings are then exploited in CSR so that CSR can incorporate both key and global semantic representations. The former is obtained based on dependency parsing and word embeddings, and the latter is obtained according to all the word representations of a text. The CSR provides sufficient semantic representation for SVM in an efficient way. The advantages of CSR-SVM have been validated by experiments on a real-world dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call